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Abstract 

We study a new framework for property testing of probability distributions, by considering 
distribution testing algorithms that have access to a conditional sampling oracle\j This is an 
oracle that takes as input a subset S C [N] of the domain [N] of the unknown probability 
distribution D and returns a draw from the conditional probability distribution D restricted to 
S. This new model allows considerable flexibility in the design of distribution testing algorithms; 
in particular, testing algorithms in this model can be adaptive. 

We study a wide range of natural distribution testing problems in this new framework and 
some of its variants, giving both upper and lower bounds on query complexity. These prob- 
lems include testing whether D is the uniform distribution U; testing whether D ~ D* for an 
explicitly provided D*; testing whether two unknown distributions Di and D2 are equivalent; 
and estimating the variation distance between D and the uniform distribution. At a high level 
our main finding is that the new "conditional sampling" framework we consider is a powerful 
one: while all the problems mentioned above have Vl{y/N) sample complexity in the standard 
model (and in some cases the complexity must be almost linear in N), we give poly (log A^, 1/e)- 
query algorithms (and in some cases poly(l/e)-query algorithms independent of N) for all these 
problems in our conditional sampling setting. 
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1 Introduction 



1.1 Background: Distribution testing in the standard model 

One of the most fundamental problem paradigms in statistics is that of inferring some information 
about an unknown probability distri bution D g iven access to independent samples drawn from it. 
More than a decade ago, Batu et al. BFR"'"OC)]n initiated the study of problems of this type from 



within the framework of property testing |RS96l IGGR98j . In a property testing problem there is an 
unknown "massive object" that an algorithm can access only by making a small number of "local 
inspections" of the object, and the goal is to determine whether the algorithm has a particular 
property. The algorithm must output ACCEPT if the object has the desired property and output 
REJECT if the object is far from every object with the property. (See [FisOll iRonOSl IRonlOi iGoITO] 
for detailed surveys and overviews of the broad field of property testing; we give precise definitions 
tailored to our setting in Section [2j) 

In distribution property testing the "massive object" is an unknown probability distribution D 
over an A^-element set, and the algorithm accesses the distribution by drawing independent samples 
from it. A wide range of different properties of probability distributions have been investigated 
in this setting, and upper and lower bounds on the number of samples required have by now 
been obtained for many problems. These include testing whether D is uniform [GROOl |BFR"'"10 



IPanOSj , testing whether D is identical to a given known distribution D* BFF"'"Ol] , testing whether 



two distributions Di, D2 (both available via sample access) are identical [BFR+nOl IValTI] . and 
testing whether D has a monotonically increasing probability mass function [BFRVll] . as well as 
related problems such as estimating the entropy of D |BDKR05t IVVl 1 j . and estimating its support 
size jRRSS09| IVallll IVVllj . Similar problems have also been studied by researchers in other 
communities, see e.g., |Ma81t [Pan04t [PanOSj . 

One broad insight that has emerged from this past decade of work is that while sublinear-sample 
algorithms do exist for many distribution testing problems, the number of samples required is in 
general quite large. Even the basic problem of testing whether D is the uniform distribution U over 
[A^] = {1, . . . , A^} versus e-far from uniform requires il(\/]V) sample^ for constant e, and the other 
problems mentioned above have sample complexities at least this high, and in some cases almost 
linear in N [RRSSOOl IValll[ IVVllj . Since such sample complexities could be prohibitively high 
in real-world settings where A^ can be extremely large, it is natural to explore problem variants 
where it may be possible for algorithms to succeed using fewer samples. Indeed, researchers have 
studied distribution testing in settings where the unknown distribution is guaranteed to have some 
special structure, such as being monotone, fc-modal or a "/c-histogram" over [A^] |BKR,n4llDDS+13 



IILR12j . or being monotone over {0, 1}" jRSOOj or over other posets jBFRVll] . and have obtained 
significantly more sample-efficient algorithms using these additional assumptions. 



There is a more recent full version of this work |BFR+ 10] and we henceforth reference this recent version. 
^To verify this, consider the family of all distributions that are uniform over half of the domain, and elsewhere. 
Each distribution in this family is 0(l)-far from the uniform distribution. However, it is not possible to distinguish 
with sufHciently high probability between the uniform distribution and a distribution selected randomly from this 
family, given a sample of size y/N jc (for a sufficiently large constant c > 1). This is the case because for the uniform 
distribution as well as each distribution in this family, the probability of observing the same element more than once 
is very small. Conditioned on such a collision event not occurring, the samples are distributed identically. 
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1.2 Our model: Conditional sampling 



In this work we pursue a different line of investigation: rather than restricting the class of probability 
distributions under consideration, we consider testing algorithms that may use a more powerful form 
of access to the unknown distribution D. This is a conditional sampling oracle, which allows the 
algorithm to obtain a draw from D5, the conditional distribution of D restricted to a subset S of 
the domain, (where S is specified by the algorithm). More precisely, we have: 

Definition 1 Fix a distribution D over [N]. A COND oracle for D, denoted COND/5, is defined as 
follows: The oracle is given as input a query set S C [N] that has D{S) > 0. The oracle returns an 
element i £ S, where the probability that element i is returned is Ds{i) = D{i)/ D{S), independently 
of all previous calls to the oracle^ 

For compatibility with our COND/) notation we will write SAMP/j to denote an oracle that 
takes no input and, each time it is invoked, returns an element from [N] drawn according to D 
independently from all previous draws. This is the sample access to D that is used in the standard 
model of testing distributions, and this is of course the same as a call to COND£)([A^]). 

Motivation and Discussion. The first motivation for studying the COND model is to capture 
scenarios that arise in application areas (e.g., in biology or chemistry) where the parameters of 
some experiment can be adjusted so as to restrict the range of possible outcomes. For example, 
a scientist growing bacteria or yeast cells in a controlled environment may be able to deliberately 
introduce environmental factors that allow only cells with certain desired characteristics to survive, 
corresponding to restricting the distribution of all experimental outcomes to a pre-specified subset. 

A second, purely theoretical motivation, is that the study of the COND model may further our 
understanding regarding what forms of information (beyond standard sampling) can be helpful for 
testing properties of distributions. In both learning and property testing it is generally interesting 
to understand how much power algorithms can gain by making queries, and COND queries are a 
natural type of query to investigate in the context of distributions. As we discuss in more detail 
below, in several of our results we actually consider restricted versions of COND queries that do 
not require the full power of obtaining conditional samples from arbitrary sets. 

A third attractive feature of the COND model is that it enables a new level of "richness" for 
algorithms that deal with probability distributions. In the standard model where only access to 
SAMP/) is provided, all algorithms must necessarily be non-adaptive, with the same initial step of 
simply drawing a sample of points from SAMP/), and the difference between two algorithms comes 
only from how they process their samples. In contrast, the essence of the COND model is to allow 
algorithms to adaptively determine later query sets S based on the outcomes of earlier queries. 

■^Note that as described above the behavior of CONDd(S') is undefined if D{S) = 0, i.e., the set S has zero 
probability under D. While various definitional choices could be made to deal with this, we shall assume that in 
such a case, the oracle (and hence the algorithm) outputs "failure" and terminates. This will never be a problem for 
us throughout this paper, as (a) all of our lower bounds will deal only with distributions that have D{i) > for all 
i G [A''], and (b) in all of our algorithms COND_d(S') will only ever be called on sets S which are "guaranteed" to have 
D{S) > 0. (More precisely, each time an algorithm calls CONDd(S') it will either be on the set S — [N], or will be 
on a set S which contains an element i which has been returned as the output of an earlier call to CONDd.) 
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Given the above motivations, the central question is whether the COND model enables signif- 
icantly more efficient algorithms than are possible in the weaker SAMP model. Our results (see 
Section I1.3P show that this is indeed the case. 

Before detailing our results, we note that many of our results will in fact deal with a weaker 
variant of the COND model, which we now describe. In designing COND-model algorithms it is 
obviously desirable to have algorithms that only invoke the COND oracle on query sets S which are 
"simple" in some sense. Of course there are many possible notions of "simplicity" ; in this work we 
consider two such notions, corresponding to two restrictions of the general COND model, as follows: 

• PCOND oracle: We define a PCOND (short for "pair-cond") oracle for D as a restricted 
version of COND/3 that only accepts input sets S which are either S = [N] (thus providing 
the power of a SAMP/) oracle) or 5 = {i,j} for some i,j G [N], i.e. sets of size two. The 
PCOND oracle may be viewed as a "minimalist" variant of COND that essentially permits 
an algorithm to compare the relative weights of two items under D (and to draw random 
samples from D, by setting S = [N]). 

• ICOND oracle: We define an I COND (short for "interval-cond" ) oracle for D as a restricted 
version of COND/j that only accepts input sets S which are intervals S = [a,b] = {a, a + 
1, . . . , 6} for some a < b £ [N] (note that taking a = 1, b = N this provides the power of a 
SAMP/) oracle). This is a natural restriction on COND queries in settings where the N points 
are endowed with a total order. 

1.3 Our results 

We give a detailed study of a range of natural distribution testing problems in the COND model and 
its variants described above, establishing both upper and lower bounds on their query complexity. 
Our results show that the ability to do conditional sampling provides a significant amount of power 
to property testers, enabling polylog(A^)-query, or even constant-query, algorithms for problems 
whose sample complexities in the standard model are A^^(^); see Tabled! While we have considered a 
variety of distribution testing problems in the COND model, our results are certainly not exhaustive, 
and many directions remain to be explored; we discuss some of these in Section [9l 

1.3.1 Testing distributions over unstructured domains 

In this initial work on the COND model our main focus has been on the simplest (and, we think, 
most fundamental) problems in distribution testing, such as testing whether D is the uniform 
distribution U; testing whether D = D* for an explicitly provided D*; testing whether Di = D2 
given COND/)^ and C0ND/)2 oracles; and estimating the variation distance between D and the 
uniform distribution. In what follows dxy denotes the variation distance. 

Testing uniformity. We give a PCOND/) algorithm that tests whether D = U versus 
dTv{D,l() > e using ©(l/e^) calls to PCONDd, independent of A^. We show that this PCONDd 
algorithm is nearly optimal by proving that any COND/) tester (which may use arbitrary subsets 
S C [N] as its query sets) requires $7(1/6^) queries for this testing problem. 
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Table 1: Comparison between the COND model and the standard model on a variety of distribution 
testing problems over [A^]. The upper bounds for the first three problems are for testing whether 
the property holds (i.e. dxY = 0) versus dxv ^ and for the last problem the upper bound is for 
estimating the distance to uniformity to within an additive ite. 



Testing equivalence to a known distribution. As described above, for the simple problem 
of testing uniformity we have an essentially optimal PCOND testing algorithm and a matching 
lower bound. Given these results it is natural to turn to the more challenging question of testing 
whether D (accessible via a PCOND or COND oracle) is equivalent to D*, where D* is an arbitrary 
"known" distribution over \N\ that is explicitly provided to the testing algorithm at no cost (say 
as a vector (£)*(!),... ,D*{N)) of probability values). For this "known Z)*" problem, we give a 
PCOND/) algorithm testing whether D = D* versus dTv{D, D*) > e using 0((log A'')^/e^) queries. 
We further show that the (log A^)^(^) query complexity of our PCOND^) algorithm is inherent in the 
problem, by proving that any PCOND^) algorithm for this problem must use y^log(A^) / log log(A^) 
queries for constant e. 

Given these (logA^)®^^^ upper and lower bounds on the query complexity of PCOND^i-testing 
equivalence to a known distribution, it is natural to ask whether the full COND/j oracle provides 
more power for this problem. We show that this is indeed the case, by giving a 0(l/e^)-query 
algorithm (independent of N) that uses unrestricted COND^) queries. 

Testing equivalence between two unknown distributions. We next consider the more 
challenging problem of testing whether two unknown distributions Di , D2 over [A^] (available via 
COND/)^ and COND^j oracles) are identical versus e-far. We give two very different algorithms for 
this problem. The first uses PCOND oracles and has query complexity 0((log iV)^/e^^), while the 
second uses COND oracles and has query complexity 0((log A^)^/e^). We believe that the proof 
technique of the second algorithm is of independent interest, since it shows how a COND/) oracle 
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can efficiently simulate an "approximate EVAL/j oracle." (An EVAL/j oracle takes as input a point 
i S [A^] and outputs the probability mass D{i) that D puts on i; we briefly explain our notion of 
approximating such an oracle in Subsection ll.3.3[ ) 

Estimating the distance to uniformity. We also consider the problem of estimating the varia- 
tion distance between D and the uniform distribution U over [N], to within an additive error of ite. 
In the standard SAMP/) model this is known to be a very difficult problem, with an ^}{N/\ogN) 
lower bound established in |VV1H IVVlOa] . In contrast, we give a PCOND^i algorithm that makes 
only 0(l/e^°) queries, independent of A^. 

1.3.2 Testing distributions over structured domains 

In the final portion of the paper we view the domain [A^] as an ordered set 1 < • • • < A^. (Note that 
in all the testing problems and results described previously, the domain could just as well have been 
viewed as an unstructured set of abstract points xi, . . . ,X]\f.) With this perspective it is natural to 
consider ICOND algorithms. 

We give an 0((log A^)^/e^)-query ICOND/j algorithm for testing whether D is uniform versus 
e-far from uniform. We show that a (log A^)^*^^) query complexity is inherent for uniformity testing 
using ICOND^), by proving an (log A^/ log log A")-query ICOND^) lower bound. 

1.3.3 A high-level discussion of our algorithms 

To maintain focus here we describe only the ideas behind our algorithms; intuition for each of our 
lower bounds can be found in an informal discussion preceding the formal proof, see the beginnings 
of Sections 14.21 15. 2| and 18.21 As can be seen in the following discussion, our algorithms share some 
common themes, though each has its own unique idea/technique, which we emphasize below. 

Our simplest testing algorithm is the algorithm for testing whether D is uniform over 
[A^] (using PCOND/) queries). The algorithm is based on the observation that if a distribution is 
e-far from uniform, then the total weight (according to D) of points y G [A^] for which D{y) > 
{l + n{e))/N is 0(e), and the fraction of points x £ [N] for which D{x) < (l-fi(e))/Af is $7(e). If we 
obtain such a pair of points (x, y), then we can detect this deviation from uniformity by performing 
0(l/e^) PCOND/) queries on the pair. Such a pair can be obtained with high probability by making 
0(l/e) SAMP/) queries (so as to obtain y) as well as selecting 0(l/e) points uniformly (so as to 
obtain x). This approach yields an algorithm whose complexity grows like 1/e^. To actually get 
an algorithm with query complexity 0(l/e^) (which, as our lower bound shows, is tight), a slightly 
more refined approach is applied. 

When we take the next step to testing equality to an arbitrary (but fully specified) 

distribution D* , the abovementioned observation generalizes so as to imply that if we sample 
0(l/e) points from D and 0(l/e) from D* , then with high probability we shall obtain a pair of 
points {x,y) such that D{x) / D{y) differs by at least (1 ±r2(e)) from D*{x)/D*{y). Unfortunately, 
this cannot necessarily be detected by a small number of PCOND/5 queries since (as opposed to the 
uniform case), D*{x)/D*{y) may be very large or very small. However, we show that by sampling 
from both D and D* and allowing the number of samples to grow with log A^, with high probability 
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we either obtain a pair of points as described above for which D*{x)/D*{y) is a constant, or we 
detect that for some set of points B we have that \D{B) — D*{B)\ is relatively largeH 

As noted previously, we prove a lower bound showing that a polynomial dependence on log N 
is unavoidable if only PCOND^) queries (in addition to standard sampling) are allowed. To obtain 
our more efficient poly(l/e)-queries algorithm, which uses more general COND^ queries, we extend 
the observation from the uniform case in a different way. Specifically, rather than comparing the 
relative weight of pairs of points, we compare the relative weight of pairs in which one element is a a 
point and the other is a subset of points. Roughly speaking, we show how points can be paired with 
subsets of points of comparable weight (according to D*) such that the following holds. If D is far 
from D*, then by taking 0(l/e) samples from D and selecting subsets of points in an appropriate 
manner (depending on D*), we can obtain (with high probability) a point x and a subset Y such 
that D{x)/D{Y) differs significantly from D*{X)/D*{y) and D*{x)/D*{Y) is a constant. 

In our next step, to testing equality between two unknown distributions Di and D2, we 

need to cope with the fact that we no longer "have a hold" on a known distribution. Our PCOND 
algorithm can be viewed as creating such a hold in the following sense. By sampling from Di we 
obtain (with high probability) a (relatively small) set of points R that cover the distribution Di. 
By "covering" we mean that except for a subset having small weight according to Di, all points y 
in [N] have a representative r £ R, i.e. a point r such that Di{y) is close to -Di(r). We then show 
that if D2 is far from Di, then one of the following must hold: (1) There is relatively large weight, 
either according to Di or according to D2, on points y such that for some r £ Rwe have that Di{y) 
is close to Di{r) but 1)2(2/) is not sufficiently close to D2{r); (2) There exists a point r £ R such 
that the set of points y for which Di{y) is close to Di[r) has significantly different weight according 
to D2 as compared to Di. We note that this algorithm can be viewed as a variant of the PCOND 
algorithm for the case when one of the distributions is known (where the "buckets" B, which were 
defined by D* in that algorithm (and were disjoint), are now defined by the points in R (and are 
not necessarily disjoint)). 

As noted previously, our (general) COND algorithm for testing the equality of two (unknown) 
distributions is based on a subroutine that estimates D{x) (to within (1 it 0(e))) for a given point 
X given access to COND/j. Obtaining such an estimate for every x G [N] cannot be done efficiently 
for some distributions]^ However, we show that if we allow the algorithm to output UNKNOWN 
on some subset of points with total weight 0(e), then the relaxed task can be performed using 
poly(log A^, 1/e) queries, by performing a kind of randomized binary search "with exceptions". 
This relaxed version, which we refer to as an approximate EVAL oracle, suffices for our needs 
in distinguishing between the case that Di and D2 are the same distribution and the case in which 
they are far from each other. It is possible that this procedure will be useful for other tasks as well. 

The algorithm for estimating the distance to uniformity (which uses poly(l/e) PCOND/) 
queries) is based on a subroutine for finding a reference point x together with an estimate D{x) 
of D{x). A reference point should be such that D{x) is relatively close to 1/A^ (if such a point 

^Here we use B for "Bucket", as we consider a bucketing of the points in [A''] based on their weight according 
to D* . We note that b ucketing has been used extensively in the context of testing properties of distributions, see 

e.g. |bfr+io||b"ff+oi| . 

^As an extreme case consider a distribution D for which D{1) — 1 — cj) and D{2) = • • • = D{n) = (f)/{n — 1) for 
some very small (which in particular may depend on n), and for which we are interested in estimating D{2). This 
requires queries. 
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cannot be found then it is evidence that D is very far from uniform). Given a reference point 
X (together with D[x)) it is possible to estimate the distance to uniformity by obtaining (using 
PCOND queries) estimates of the ratio between D{x) and D{y) for poly(l/e) uniformly selected 
points y. The procedure for finding a reference point x together with D{x) is based on estimating 
both the weight and the size of a subset of points y such that D{y) is close to D{x). The procedure 
shares a common subroutine, Estimate-Neighborhood, with the PCOND algorithm for testing 
equivalence between two unknown distributions. 

Finally, the ICOND/) algorithm for testing uniformity is based on a version of the approximate 
EVAL oracle mentioned previously, which on one hand uses only ICOND/) (rather than general 
COND/)) queries, and on the other hand exploits the fact that we are dealing with the uniform 
distribution rather than an arbitrary distribution. 



1.4 The independent work |CFGM13] 



In what follows we discuss the work of Chakraborty et al. |CFGMl"3] . which was done independently 
from our work, and was recently accepted to the ITCS conference (so that we learned about its 
existence only a few days before the STOC submission deadline). Chakraborty et al. |CFGMl"3] 
propose essentially the same COND model that we propose, differing only in what happens on query 
sets S such that D{S) = 0. In our model such a query causes the COND oracle and algorithm to 
return FAIL, while in their model such a query returns a uniform random i £ S. They present the 
following results. 

• An (adaptive) algorithm for testing uniformity that performs poly(l/e) queries^ The sets 
on which the algorithms performs COND queries are of size linear in 1/e. Recall that our 
algorithm for this problem performs 0(l/e^) PCOND queries and that we show that every 
algorithm must perform 0(l/e'^) queries (when there is no restriction on the types of queries). 
We note that their analysis uses the same observation that ours does regarding distributions 
that are far from uniform (see the discussion in Subsection I1.3.3P , but exploits it in a different 
manner. 

They also give a non-adaptive algorithm for this problem that performs poly (log N,l/e) COND 
queries and show that r2(loglog A^) is a lower bound on the necessary number of queries for 
non-adaptive algorithms. 



An (adaptive) algorithm for testing whether D is equivalent to a specified distribution D* 
using poly(log* A^, 1/e) COND queries. Recall that we give an algorithm for this problem that 
performs 0(l/e^) COND queries. 

They also give a non-adaptive algorithm for this problem that performs poly (log N,l/e) COND 
queries. 

An (adaptive) algorithm for testing any label- invariant (i.e., invariant under permutations of 
the domain) property that performs poly(log A^, 1/e) COND queries. As noted in |CFGM13] . 
this in particular implies an algorithm with this complexity for estimating the distance to 



^The precise polynomia l is not specified - we believe it is roughly as it follows fi-om an application of the 
identity tester of [BFF^Ol] with distance on a domain of size 0{l/e). 
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uniformity. Recall that we give an algorithm for this estimation problem that performs 
poly(l/e) PCOND queries. 

The algorithm for testing any label-invariant property is based on learning a certain approx- 
imation of the distribution D and in this process defining some sort of approximate EVAL 
oracle. To the best of our understanding, our notion of an approximate EVAL oracle (which is 
used to obtain one or our results for testing equivalence between two unknown distributions) 
is quite different. 

They also show that there exists a label-invariant property for which any adaptive algorithm 
must perform 0,{^/\og log N) COND queries. 

• Finally they show that there exist general properties that require il.{n) COND queries. 



2 Preliminaries 
2.1 Definitions 

Throughout the paper we shall work with discrete distributions over an A^-element set whose 
elements are denoted {1, . . . , N}; we write [A^] to denote {1, . . . , N} and [a, b] to denote {a, . . . ,b}. 
For a distribution D over [A'^] we write D(i) to denote the probability of i under D, and for S C [N] 
we write D{S) to denote X]ie5-^(^)- ^'^^ — 1^1 such that D{S) > we write Ds to denote the 
conditional distribution of D restricted to S, so Dsii) = for z G 5 and Ds{i) = for « ^ S". 

As is standard in property testing of distributions, throughout this work we measure the distance 
between two distributions Di and D2 using the total variation distance: 

dTY{Di,D2) = hDi-D2\\^ = \ J2 l^i«-^2(i)|= max |Z?i(5)-D2(5)|. 
^ ^ ielN] ^-'^1 

We may view a property V of distributions over [A^] as a subset of all distributions over [A^] 
(consisting of all distributions that have the property). The distance from D to a property V, 
denoted dTyiDjV), is defined as inf/)'g-p{(iTv(-C) -^Ol- 

We define testing algorithms for properties of distributions over [A^] as follows: 

Definition 2 Let V be a property of distributions over [N]. Let ORACLE/) be some type of oracle 
which provides access to D. A g(e, A^)-query ORACLE testing algorithm for V is an algorithm 
T which is given e,N as input parameters and oracle access to an ORACLE/) oracle. For any 
distribution D over [N] algorithm T makes at most q(e,N) calls to ORACLE/), and: 

• if D & V then with probability at least 2/3 algorithm T outputs ACCEPT; 

• if d'Y\{D^V) > e then with probability at least 2/3 algorithm T outputs REJECT. 
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This definition can easily be extended to cover situations in which there are two "unknown" 
distributions Di,D2 that are accessible via ORACLE^ij and ORACLE/jj oracles. In particular we 
shall consider algorithms for testing whether Di = D2 versus cItyIDi, D2) in such a setting. We 
sometimes write yORACLEi, indicate that T has access to ORACLEx). 

2.2 Useful tools 

On several occasions we will use the data processing inequality for variation distance. This fun- 
damental result says that for any two distributions D, D' , applying any (possibly randomized) 
function to D and D' can never increase their statistical distance; see e.g. part (iv) of Lemma 2 of 
[Reyll] for a proof of this lemma. 

Lemma 1 (Data Processing Inequality for Total Variation Distance) Let D, D' be two 

distributions over a domain il. Fix any randomized functioi^ F on il, and let F{D) be the distri- 
bution such that a draw from F{D) is obtained by drawing independently x from D and f from F 
and then outputting f{x) (likewise for F[D')). Then we have 

dTY{F{D),F{D')) < dry{D,D'). 



We next give several variants of Chernoff bounds (see e.g. Chapter 4 of [MR95] ) . 

Theorem 1 Let Yi, . . . ,Ym be m independent random variables that take on values in [0, 1], where 
E[yj] = Pi, and YliLiPi = P- For any 7 S (0, 1] we have 



(additive bound) Pr 

(multiplicative bound) 
and 

(multiplicative bound) 



^ > P + 7m 
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m 



.1=1 



< ex.p{-2'y^m) (1) 

< exp(-72p/3) (2) 

<exp(-72p/2). (3) 



The bound in Equation (0) is derived from the following more general bound, which holds from any 
7 > 0; 



Pr 



i=l 



< 



(1+^)1+7 



(4) 



and which also implies that for any B > 2eP, 



Pr 



.1=1 



< 2" 



(5) 



Which can be seen as a distribution over functions over Q,. 
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The following extension of the multiplicative bound is useful when we only have upper and/or 
lower bounds on P (see Exercise 1.1 of [DP09j ): 



Corollary 2 In the setting of Theorem{I\ suppose that Pl ^ P ^ Ph- Then for any 7 G (0, 1], we 
have 



Pr 



Pr 



j=i 

J]y,<(i-7)PL 



i=l 



< exp(-72pH/3) 

< exp(-72Pi/2) 



(6) 
(7) 



We will also use the following corollary of Theorem [T) 



Corollary 3 Let < wi^ . . . ,Wm € M &e such that Wi < k for all i G [m] where k € (0,1]. 
Let Xi, . . . , Xm be i.i.d. Bernoulli random variables with Pr[Xj = 1] = 1/2 for all i, and let 
^ = Z^i^i '^i^i "''^d W = YliLi Wi- For any 7 e (0, 1], 



Pr 



X > (1 + 7) 



r,W\ 

< exp —7 — and Pr 

OK J 



X<(l-7) 



W 



<exp(-7 -1 , 



and for any B > e ■ W , 



Vi[X > B]< l-^l"" . 



Proof: Let w\ = Wi/K (so that w- G [0, 1]), let W = YT=i K = W'^' and for each i G [m] let 

= w'^Xi, so that takes on values in [0, 1] and E[Yi\ = w\l2. Let X' = YllLi '^i^i = S^ii 
so that E[X'] = W'/2. By the definitions of W and X' and by Equation for any 7 G (0, 1], 



Pr 



w 

X>{1 + ^)- 



Pr 



^'>(l + 7)^ 



<exp(-7^— j=exp^-r-l, (8) 



and similarly by Equation ([3|) 



Pr 



W 

X<(l-7)^ 



<exp(-7 -J 



For B > e ■ W = 2e ■ W/2 we apply Equation ([5]) and get 

Pr [X > B]= Pr [X' > < 2"^/'^, 

as claimed. H 



(9) 



(10) 



3 Some useful procedures 



In this section we describe some procedures that will be used by our algorithms. 
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3.1 The procedure Compare 



We start by describing a procedure that estimates the ratio between the weights of two disjoint sets 
of points by performing COND queries on the union of the sets. In the special case when each set 
is of size one, the queries performed are PCOND queries. 

Algorithm 1: COMPARE 
Input: COND query access to a distribution D over [A^], disjoint subsets X,Y d \N\, 
parameters r/ G (0, 1], > 1, and 5 G (0, 1/2]. 

1. Perform 0^ ^^°^^^^*^^ ^ COND/5 queries on the set 5 = X U y, and let fi be the fraction of 
times that a point y G y is returned. 

2. If /z < I • 7^^, then return Low. 

3. Else, if 1 — /i < I • Tjq:]-, then return High. 

4. Else return n = -j-^. 

r 1—;/, 



Lemma 2 Given as input two disjoint subsets of points X,Y together with parameters r] G (0, 1], 
K > 1, and 6 G (0, 1/2], as well as COND query access to a distribution D, the procedure Compare 
(AlgorithmUl) either outputs a value p > or outputs High or Low, and satisfies the following: 



1. If D{X) / K < D(Y) < K ■ D(X) then with probability at least 1 — 6 the procedure outputs a 
value p G [1 - r/, 1 + r]]D{Y)/D{X); 

2. If D{Y) > K ■ D{X) then with probability at least 1 — 5 the procedure outputs either High or 
a value p G [1 - r?, 1 + ri\D{Y) / D{X); 

3. If D(Y) < D{X)/K then with probability at least 1 — 5 the procedure outputs either Low or a 
value p G [1 - r/, 1 + ri\D{Y) / D{X). 



Proof: The bound on the number of queries performed by the algorithm follows directly from the 
description of the algorithm, and hence we turn to establish its correctness. 

Let w{X) = £,(^x)+D{Y) ^"^^ w{Y) = £i(^x')+D{Y) • Observe that = £)^(x) 
as defined in Line [1] of the algorithm, E[/i] = w{Y) and E[l — fi] = w{X). Also observe that for any 
S > 1, if D{Y) > D{X)/B, then w{Y) > ^ and if D{Y) < B ■ D{X), then w{X) > 

Let El be the event that /i G [1 — rj/3, 1 + r]/3]w{Y) and let E2 be the event that {1 — fi) G 
[1 — rj/3, 1 + rj/3]w{X). Given the number of COND queries performed on the set X U Y, by 
applying a multiplicative Chernoff bound (see Theorem [1]), if w{Y) > then with probability at 
least 1 — (5/2 the event E\ holds, and if w(X') > then with probability at least 1 — 5/2 the event 
E2 holds. We next consider the three cases in the lemma statement. 




COND queries on the set X UY. 
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1. If D{X)/K < D{Y) < KD{X), then by the discussion above, w{Y) > -j^, w{X) > 
and with probabihty at least 1 — 5 we have that /t G [1 — r]/3, 1 + ri/3]w{Y) and (1 — /i) S 
[1 — rj/S, 1 + r]/3]w{X). Conditioned on these bounds holding, 

. l-r//3 2 1 , ^21 

/I > — > — • — and 1 — > — 



K + 1 - 3 K + 1 "^-S/C + l 

It follows that the procedure outputs a value p = G [1 — rj,! + rj] as required by 
Itemdl 

2. If D(Y) > K ■ D{X), then we consider two subcases. 

(a) If D{Y) > 3K ■ D{X), then w{X) < gT^qrj, so that by a multiplicative Chernoff bound 
(stated in Corollary [2]) , with probability at least 1 — 6 we have that 

1-A<1±^<1.^<^. ' 



3A' + 1 - 3 + 1 ~ 3 K + V 

causing the algorithm to output High. Thus Item [2] is established for this subcase. 

(b) If K ■ D{X) < D{Y) < 3K ■ D{X), then w{X) > 3^ and w{Y) > i, so that the events 
El and E2 both hold with probability at least 1 — 6. Assume that these events in fact 
hold. This implies that fi > ^ — | ' TT+i^ algorithm either outputs High or 



outputs p = -Aj G [1 — 1 + rj\ ^jhX , so Item [2] is established for this subcase as well. 



3. If D{Y) < D{X)/K, so that D{X) > K ■ D(Y), then the exact same arguments are applied 
as in the previous case, just switching the roles of Y and X and the roles of p and 1 — p so 
as to establish ItemO 

We have thus established all items in the lemma. I 



3.2 The procedure Estimate-Neighborhood 

In this subsection we describe a procedure that, given a point x, provides an estimate of the weight 
of a set of points y such that D{y) is similar to D{x). In order to specify the behavior of the 
procedure more precisely, we introduce the following notation. For a distribution D over [A^], a 
point X G [N] and a parameter 7 G [0, 1], let 

U^{x) [y G [N] : y^D{x) < D{y) < (1 + ^)D{x)] (11) 

denote the set of points whose weight is "7-close" to the weight oi x. If we take a sample of points 
distributed according to D, then the expected fraction of these points that belong to Uj^{x) is 
D{Ul^{x)). If this value is not too small, then the actual fraction in the sample is close to the 
expected value. Hence, if we could efficiently determine for any given point y whether or not it 
belongs to U^{x), then we could obtain a good estimate of D{Ul^ (x)). The difficulty is that it is 
not possible to perform this task efficiently for "boundary" points y such that D{y) is very close 
to (1 + 7)-D(x) or to j^D{x). However, for our purposes, it is not important that we obtain the 
weight and size of U^{x) for a specific 7, but rather it suffices to do so for 7 in a given range, as 
stated in the next lemma. 
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Lemma 3 Given as input a point x together with parameters k, (3,r],5 S (0, 1/2] as well as PCOND 
query access to a distribution D, the procedure Estimate-Neighborhood (Algorithmic outputs 

a pair {w,a) G [0, 1] x (k,2k) such that a is uniformly distributed in {k + ^^I^Zq ^ f^''^ ^ ~ ^^el^' 
and such that the following holds: 

1. If D{U^{x)) > (3, then with probability at least 1 — 5 we have w ^ [1 — 77, 1 + 77] • D{U^{x)), 
andD{U^^g{x)\US{x))<rjP/lQ; 

2. If D{U^{x)) < P, then with probability at least 1 — 5 we have w < (l + r/)-/3, and D{U^_^g{x)\ 
The number of PCOND queries performed by the procedure is o( ^°^^^^^'''^°^J'^°^-/l^^^^^^ . 



Algorithm 2: Estimate-Neighborhood 
Input: PCOND query access to a distribution D over [A^], a point x G [A^] and parameters 
K,p,rj,6 G{0,l/2] 
1: Set^ = ^andr = f = |L. 

2: Select a value a € {k + i0}[^o uniformly at random. 

3: Call the SAMP/j oracle @{log{l / S) / {(^rj"^)) times and let S be the set of points obtained. 
4: For each point y in S call COMPARE£)({x}, {y}, 0/4, 4, (5/(4|5|)) (if a point y appears more 

than once in S, then COMPARE is called only once on y). 
5: Let vj be the fraction of occurrences of points y in 5 for which COMPARE returned a value 

p{y) G [1/(1 + a + e/2), (1 + a + 9/2)]. (That is, S is viewed as a multiset.) 
6: Return {w,a). 



Proof of Lemma [3} The number of PCOND queries performed by Estimate-Neighborhood 
is the size of S times the number of PCOND queries performed in each call to Compare. By 
the setting of the parameters in the calls to Compare, the total number of PCOND queries is 
Q|- (|S|).iog|S|/^) ^ = O ( ^"^^^/'^^•^"fi^Tj^/,^^^/^^^' ) . We now turn to establishing the correctness of 
the procedure. 

Since D and x are fixed, in what follows we shall use the shorthand Uj for U^{x). For 

a G {K-|-i0}[~g, let Aq, =^ Ua+e \ Ua- We next define several "desirable" events. In all that 
follows we view S as a multiset. 



1. Let El be the event that D{Aa) < 4/((5r)). Since there are r disjoint sets for 
a G {K + i9}^~Q, the probability that Ei occurs (taken over the uniform choice of a) is at 
least 1 — 6/4. From this point on we fix a and assume Ei holds. 

2. The event E2 is that \S H Aq,|/|5| < 8/{6r) (that is, at most twice the upper bound on 
the expected value). By applying the multiplicative Chernoff bound using the fact that 
\S\ = e(log(l/5)/(/3r/2)) = n{log{l/S) ■ (<5r)), we have that Pr5[£;2] > 1 - 6/4. 
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3. The event £"3 is defined as follows: U D{Ua) > /S, then \SnUa\/\S\ G [l-r]/2,l + ri/2]- D{Ua), 
and if D{Ua) < /3, then \S R C/q,|/|5| < (1 + r]/2) • [3. Once again applying the multiplicative 
Chernoff bound (for both cases) and using that fact that IS*! = ©(log(l/(^)/(/5ry^)), we have 
that PislEs] >l-S/A. 

4. Let E4 be the event that all calls to Compare return an output as specified in LemmaEJ Given 
the setting of the confidence parameter in the calls to Compare we have that Pr[£J4] > 1 — 5/4 
as well. 



Assume from this point on that events Ei through £'4 all hold where this occurs with probability at 
least 1—6. By the definition of Aq, and Ei we have that D{Ua+9\Ua) < 4/((5r) = ?7/?/16, as required 
(in both items of the lemma). Let T be the (multi-)subset of points y in 5 for which Compare 
returned a value p{y) G [l/(l + a + ^/2), (l + a + 0/2)] (so that w, as defined in the algorithm, equals 
|T|/|5|). Note first that conditioned on £4 we have that for every y S U2K it holds that the output of 
Compare when called on {x} and {y}, denoted p{y), satisfies p{y) G [l-6'/4, 1 + 9 / A]{D{y) / D{x)) , 
while for y ^ U2K either Compare outputs High or Low or it outputs a value p{y) G [1 — 9/4, 1 + 
9/4:]{D{y)/D{x)). This implies that if y E Ua, then p{y) < (1 + a) • (1 + 9/A) < l + a + 9/2 and 
p{y) > (l + a)"^-(l-6'/4) > (l + a + 6l/2)-\ so that 5nC/„) C T. On the other hand, ify ^ U^+e 
then either p(y) > {l+a+9)-{l-9/4) > l+a+9 /2 or p{y) < {l+a+9y^-{l+9/4:) < {l+a+9/2)-'^ 
so that T C 5" n Ua+g- Combining the two we have: 



Recalling that t() = -Kr, the left-hand side of Equation (jl2p implies that 



SnUa Q T c sn Ua+d ■ (12) 

r In r\ I /"iTt- <-i n 

and by Ei and E2, the right-hand-side of Equation (jl2p implies that 

We consider the two cases stated in the lemma: 

1. If D{Ua) > /3, then by Equation ()13p . Equation ()14p and (the first part of) £3, we have that 

we[i-v,i + v]-D{Ua)- 

2. If D{Uct) < /?, then by Equation (jl4p and (the second part of) £3, we have that w < {1 + i])f3. 
The lemma is thus established. I 



4 Algorithms and lower bounds for testing uniformity 

4.1 A 0(l/e^)-query PCOND algorithm for testing uniformity 

In this subsection we present an algorithm PCOND/j-Test-Uniform and prove the following the- 
orem: 
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Theorem 4 PCONDzj-Test-Uniform is a 0{l/e'^)- query PCOND^ testing algorithm for unifor- 
mity, i.e. it outputs ACCEPT with probability at least 2/3 if D = U and outputs REJECT with 
probability at least 2/3 if d'^Y[D,U) > e. 



Intuition. For the sake of intuition we first describe a simpler approach that yields a 0(l/e^)-query 
algorithm, and then build on those ideas to obtain our real algorithm with its improved 0(l/e^) 
bound. Fix D to be a distribution over [A^] that is e-far from uniform. Let 



1 



D{h) > — \ and L = <{ £ G [iV] 



F = I /i G [iV] 

It is easy to see that since D is e-far from uniform, we have 



heH 



ieL 



(15) 



From this it is not hard to show that 



(i) many elements of [N] must be "significantly light" in the following sense: Define L' C L to 
he L' = { £ e [N] \ D{i) < - ^ }. Then it must be the case that \L'\> {e/4)N. 

(ii) D places significant weight on elements that are "significantly heavy" in the following sense: 
Define H' C H to he H' = { h G [N] \ D{h) >j^ + j^]. Then it must be the case that 
D{H') > (6/4). 

Using (i) and (ii) it is fairly straightforward to give a 0(l/e'*)-query PCOND/j testing algorithm 
as follows: we can get a point in L' with high probability by randomly sampling 0(l/e) points 
uniformly at random from [N], and we can get a point in H' with high probability by drawing 
0(l/e) points from SAMP/j. Then at least one of the 0(l/e^) pairs that have one point from the 
first sample and one point from the second will have a multiplicative factor difference of 1 + ri(e) 
between the weight under D of the two points, and this can be detected by calling the procedure 
Compare (see Subsection 13. ip . Since there are 0(l/e^) pairs and for each one the invocation of 
Compare uses 0(l/e^) queries, the overall sample complexity of this simple approach is 0(l/e^). 

Our actual algorithm PCOND/^-Test-Uniform for testing uniformity extends the above ideas 
to get a 0(l/e^)-query algorithm. More precisely, the algorithm works as follows: it first draws a 
"reference sample" of 0(1) points uniformly from [A^]. Next, repeatedly for 0(log|) iterations, the 
algorithm draws two other samples, one uniformly from [N] and the other from SAMP/). (These 
samples have different sizes at different iterations; intuitively, each iteration is meant to deal with 
a different "scale" of probability mass that points could have under D.) At each iteration it then 
uses Compare to do comparisons between pairs of elements, one from the reference sample and 
the other from one of the two other samples. If D is e-far from uniform, then with high probability 
at some iteration the algorithm will either draw a point from SAMP/j that has "very big" mass 
under D, or draw a point from the uniform distribution over [A^] that has "very small" mass under 
D, and this will be detected by the comparisons to the reference points. Choosing the sample sizes 
and parameters for the Compare calls carefully at each iteration yields the improved query bound. 
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Algorithm 3: PCONDd-Test-Uniform 



Input: error parameter e > 0; query access to PCOND/j oracle 
1: Set t = log(|) + 1. 

2: Select q = 0(1) points ii,. . . ,iq independently and uniformly from [N]. 
3: for j = I to t do 

4: Call the SAMP/) oracle sj = Q(2^ ■ tj times to obtain points hi, ... , hg. distributed 
according to D. 

5: Select Sj points ii, ■ ■ ■ ,isj independently and uniformly from [N]. 

6: for all pairs {x,y) = (ir, hr') and {x,y) = {ir,ir') (where I <r < q, 1 < r' < Sj) do 

7: Call COMPAREz)({x},{y},e(e2J),2,exp-®W). 

8: if the Compare cah does not return a value in — 2-'~^|, ^ + 2-^^^|] then 

9: output REJECT (and exit). 

10: end if 
11: end for 
12: end for 
13: Output ACCEPT 



Let rrij denote the number of PCOND/) queries used to run COMPARE/j in a given execution of 
Line [7] during the j-th iteration of the outer loop. By the setting of the parameters in each such 



call and Lemma O rrij = 0( 



It is easy to see that the algorithm only performs PCOND/) 



queries and that the total number of queries that the algorithm performs is 



0{^q-SrmA =0 j;2nog 



1\ log(7) 
e222i 



01 



(log(i)) 



1\\2 



We prove Theorem [5] by arguing completeness and soundness below. 

Completeness: Suppose that D is the uniform distribution. Then for any fixed pair of points 
(x,y), Lemma [2] implies that the call to Compare on {x},{y} in Line [7] causes the algorithm to 
output REJECT in Line [9] with probability at most e~®^*) = poly(e). By taking a union bound over 
all poly(l/e) pairs of points considered by the algorithm, the algorithm will accept with probability 
at least 2/3, as required. 

Soundness: Now suppose that D is e-far from uniform (we assume throughout the analysis that 
e = 1/2^ for some integer k, which is clearly without loss of generality). We define H,L as above 
and further partition H and L into "buckets" as follows: for j = 1, . . . ,t — 1 = log(-), let 



and 

Also define 

dcf 



Hi 







H, 



L 



dcf 



dcf 



1 + 2^- 



j^<D(h)<(l + - 



1 

N 



< D{h) < 1 + 2-'' 



2^- 



1 



dcf 



1 

' iV 

1 

A 



i)4<°("<^ 
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and 



First observe that by the definition of Hq and Lq, we have 



E (m - ^) < I a„d E (1 - Dii)\ 

htzU„ ^ ' Oct „ ^ ^ 



Therefore (by Equation (|15p ) we have 



e 

< - 
- 4 



j=l h&Hj ^ ^ 3=\ eeLj ^ 



This impHes that for some 1 < j{H) < t, and some 1 < j{L) < t we have 

^ U ^ / CicT \ 



(16) 



The rest of the analysis is divided into two cases depending on whether \L\ > or \H\ > ^ 

1: \L\ > f . In this case, with probabihty at least 99/100, in Line [2] the algorithm will sek 
at least one point ir G L. We consider two subcases: j{H) = t, and j{H) < t — 1. 



j{H) = t: In this subcase, by Equation (|16p we have that X^fte-ff — This implies 

that when j = j{H) = t = log(|) + 1, so that Sj = St = ©(f)) with probability at least 
99/100 the algorithm selects a point h^i G Hf in LineHl Assume that indeed such a point /i^/ 
is selected. Since D(h^i) > while D{ir) < jj', Lemma [2] implies that with probability at 
least 1 — poly(e) the Compare call in Line [7] outputs either High or a value that is at least 
^ = i + ^. Since ^ > | + 2-'~^| for j = t, the algorithm will output REJECT in Line[9l 

j{H) < t: By Equation (jl6p and the definition of the buckets, we have 



implying that > ^j^sy^ so that D{Hj(^H^) > ^jp^^- Therefore, when j = j{H) so that 

Sj = 0(2-'(^)t), with probability at least 99/100 the algorithm will get a point h^' G ^j{H) 
Linedl Assume that indeed such a point hr' is selected. Since D{hr') > (l + 2-^(^)~^|) , 
while D{ir) < for aj(H) = 2^^^^~^j, we have 

Since Compare is called in Line[7]on the pair {ir}, {hr'} with the "5" parameter set to Q{e2^), 
with probability 1 — poly(e) the algorithm outputs REJECT as a result of this Compare call. 
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Case 2: \H\ > This proceeds similarly to Case 1. In this case we have that with high constant 
probability the algorithm selects a point ij. € H in Line [21 Here we consider the subcases j (L) = t 
and j{L) <t — 1. In the first subcase we have that Xl^eLt W — 1-^*1 — (^)"'' ^^'^ 

second case we have that X^^gj^, ^ (2-'^^^|);^ ^ so that |-t'j(L)| > 23^X)^- '^^^ analysis of each 
subcase is similar to Case 1. This concludes the proof of Theorem HI ■ 



4.2 An 0(l/e^) lower bound for COND/j algorithms that test uniformity 

In this subsection we give a lower bound showing that the query complexity of the PCOND/5 
algorithm of the previous subsection is essentially optimal, even for algorithms that may make 
general COND^) queries: 



Theorem 5 Any CONDd algorithm for testing whether D =U versus dT^y{D,U) > e must make 
r2(l/e^) queries. 



The high-level idea behind Theorem [5] is to reduce it to the well-known fact that distinguishing 
a fair coin from a + 4e)-biased coin requires f^(^) coin tosses. We show that any g-query 
algorithm COND^) testing algorithm A can be transformed into an algorithm A' that successfully 
distinguishes q tosses of a fair coin from q tosses of a + 4e)-biased coin. 

Proof of Theorem O First note that we may assume without loss of generality that < e < 1/8. 
Let A be any g-query algorithm that makes COND/) queries and tests whether D = U versus 
dicy{D,U) > e. We may assume without loss of generality that in every possible execution algorithm 
A makes precisely q queries (this will be convenient later). 



l+2e 



for each i € [l, and has Dno(^) 



l-2e 



Let Dno be the distribution that has DnoI^) — j-^'- ^<^^'^ " ^ l-^' TJ ""^'■^ ^^""^ -^imow — ^/v^ 
for each z G + 1, -/V] . (This is the "no" -distribution for our lower bound; it is e-far in variation 
distance from the uniform distribution U.) By Definition [21 it must be the case that 



Z 



Pr 



A 



CONDr 



outputs ACCEPT 



Pr 



A 



COND 



outputs ACCEPT 



> 1/3. 



The proof works by showing that given A as described above, there must exist an algorithm 
A' with the following properties: A' is given as input a g-bit string (bi, . . . ,bq) £ {0, 1}^. Let Dq 
denote the uniform distribution over {0, 1}'^ and let D^,: denote the distribution over {0, l}"^ in 
which each coordinate is independently set to 1 with probability 1/2 + 4e. Then algorithm A' has 

\PTb^Do[A'{b) outputs ACCEPT] - iPvb^Dd^' {b) outputs ACCEPT] | = Z. (17) 

Given (jl7p . by the data processing inequality for total variation distance (Lemma [11) we have that 
Z < dT:y{DQ, D^f:). It is easy to see that dTv(-Do, -D4e) is precisely equal to the variation distance 
(iTv(Bin(g, l/2),Bin(g, 1/2 + 4e)). However, in order for the variation distance between these two 
binomial distributions to be as large as 1/3 it must be the case that q > nil/e"^): 
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Fact 4 (Distinguishing Fair from Biased Coin) Suppose m < with c a sufficiently small 
constant and e < 1/8. Then, 



(Fact m is well known; it follows, for example, as an immediate consequence of Equations (2.15) 
and (2.16) of |AJ06j .) Thus to prove Theorem [5] it remains only to describe algorithm A' and prove 
Equation (fT7|l . 

As suggested above, algorithm A' uses algorithm A; in order to do this, it must perfectly 
simulate the COND/) oracle that A requires, both in the case when D = Li and in the case when 
D = Dno- We show below that when its input b is drawn from Dq then A' can perfectly simulate 
the execution of A when it is run on the CONDj^ oracle, and when b is drawn from D^^ then A' can 
perfectly simulate the execution of A when it is run on the COND/jj^^ oracle. 

Fix any step 1 < t < (?. We now describe how A' perfectly simulates the t-ih. step of the 
execution of A (i.e. the t-ih. call to COND/5 that A makes, and the response of COND/)). We may 
inductively assume that A' has perfectly simulated the first t — 1 steps of the execution of A. 

For each possible prefix of t — 1 query-response pairs to COND/) 



(where each Si C [A^] and each Si S Si), there is some distribution ^ a,prefix over possible t-ih. 
query sets St that A would make given that its first t — 1 query-response pairs were PREFIX. So 
for a set S C [n] and a possible prefix PREFIX, the value Pa,prefix{S) is the probability that 
algorithm A, having had the transcript of its execution thus far be PREFIX, generates set St as 
its t-ih query set. For any query set S C [n], let us write 5 as a disjoint union S = SqU Si, where 
So = 5 n [1, y] and 5i = 5 n + 1, A^]- We may assume that every query S ever used by A 
has |5o| ,1'S'il > 1 (for otherwise A could perfectly simulate the response of COND£)(S') whether 
D were lA or Dno by simply choosing a uniform point from S, so there would be no need to call 
COND/) on such an S). Thus we may assume that Pa,prefix{S) is nonzero only for sets S that 
have ISol, > 1. 

Consider the bit bt E {0, 1}. As noted above, we inductively have that (whether D isU 01 -Dno) 
the algorithm A' has perfectly simulated the execution of A for its first t — 1 query-response pairs; 
in this simulation some prefix PREFIX = {{Si, si), . . . , {St-i, St^i)) of query-response pairs has 
been constructed. If 6 = {hi, ... ,bq) is distributed according to Dq then PREFIX is distributed 
exactly according to the distribution of ^'s prefixes of length t — 1 when A is run with COND^, 
and if 6 = {bi, . . . ,bq) is distributed according to L'4e then the distribution of PREFIX is exactly 
the distribution of j4's prefixes of length t — 1 when A is run with COND/jj^^. 

Algorithm A' simulates the t-ih stage of the execution of A as follows: 

1. Randomly choose a set 5 C [n] according to the distribution Pa,prefix'-, let 5 = 6*0 U 5i be 
the set that is selected. Let us write a{S) to denote |5'[|/|5'q| (so a{S) G [2/A^, A^/2]). 

2. If 6( = 1 then set the bit a € {0, 1} to be 1 with probability ut and to be with probability 
1 — ut- If 6t = then set o" to be 1 with probability vt and to be with probability 1 — vt. 
(We specify the exact values of ut,vt below.) 




PREFIX = ((Si, si), {St-i,st-i)) 
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3. Set s to be a uniform random element of So-- Output the query-response pair {St, st) = {S, s). 



It is clear that Step 1 above perfectly simulates the t-th query that algorithm A would make 
(no matter what is the distribution D). To show that the t-th response is simulated perfectly, we 
must show that 

(i) if bt is uniform random over {0, 1} then s is distributed exactly as it would be distributed if 
A were being run on COND^^ and had just proposed 5 as a query to COND^; i.e. we must 

show that s is a uniform random element of Si with probability p{a) *== and is a uniform 
random element of 5*0 with probability 1 — p{a). 

(ii) if bt E {0, 1} has Pr[&f = 1] = 1/2 + 4e, then s is distributed exactly as it would be distributed 
if A were being run on COND/)^^^ and had just proposed S as a query to COND^^; i.e. we 

must show that s is a uniform random element of Si with probability q{a) '= ^_^(^i_^2t)/(i-2e) 
and is a uniform random element of Sq with probability 1 — q{a). 



By (i), we require that 
and by (ii) we require that 



It is straightforward to check that 



a + 2q2 + 4ae - 2a^e 3a + 2a^ + 4ae - 2a^t 



2 + 4a + 2a2 + 4e-2a2e' 2 + 4a + 2a2 + 4e - 2a2e' 

satisfy the above equations, and that for < a, < e < 1/8 we have < u < 1. So indeed 
A' perfectly simulates the execution of A in all stages t = 1, ... ,(7. Finally, after simulating the 
t-th stage algorithm A' outputs whatever is output by its simulation of A, so Equation (|17p indeed 
holds. This concludes the proof of Theorem [5l I 



5 Testing equivalence to a known distribution 

5.1 A poly(logn, l/e)-query PCONDz) algorithm 

In this subsection we present an algorithm PCOND-Test-Known and prove the following theorem: 

Theorem 6 PCOND-Test-Known is a O {{log NY /e^)- query PCOND/j testing algorithm for test- 
ing equivalence to a known distribution D* . That is, for every pair of distributions D,D* over [N] 
(such that D* is fully specified and there is PCOND query access to D) the algorithm outputs AC- 
CEPT with probability at least 2/3 if D = D* and outputs REJECT with probability at least 2/3 if 
dry{D,D*) > e. 
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Intuition. Let D* be a fully specified distribution, and let D be a distribution that may be accessed 
via a PCONDi) oracle. The high-level idea of the PCOND-Test-Known algorithm is the following: 

D'{y) 



As in the case of testing uniformity, we shall try to "catch" a pair of points x, y such that differs 
significantly from ^Jfr (so that calling COMPARE/) on {x}, {y} will reveal this difference). In the 



uniformity case, where D*{z) = 1/N for every z (so that D*(^x)+D*{y) ~ ^/'^)^ S^t a poly(l/e)- 
query algorithm it was sufficient to show that sampling G(l/e) points uniformly (i.e., according to 
D*) with high probability yields a point x for which D{x) < D*{x) — Q{e/N), and that sampling 
0(l/e) points from SAMP/) with high probability yields a point y for which D{x) > D*{y)+Q,{e/N). 
However, for general D* it is not sufficient to get such a pair because it is possible that D*{y) could 
be much larger than D*(x). If this were the case then it might happen that both -jyr^ and 

are very small, so calling Compare/) on cannot efficiently demonstrate that differs 

from gg. 

To address this issue we partition the points into 0(logA^/e) "buckets" so that within each 
bucket all points have similar probability according to D* . We show that if D is e-far from D* , 
then either the probability weight of one of these buckets according to D differs significantly from 
what it is according to D* (which can be observed by sampling from D)^ or we can get a pair {x, y} 
that belong to the same bucket and for which D{x) is sufficiently smaller than D*{x) and D[y) is 
sufficiently larger than D*{y). For such a pair Compare will efficiently give evidence that D differs 
from D* . 

The algorithm and its analysis. We define some quantities that are used in the algorithm and 

its analysis. Let r/ *== e/c for some sufficiently large constant c that will be determined later. As 
described above we partition the domain elements [N] into "buckets" according to their probability 
weight in D* . Specifically, for j = 1, . . . , \\og[N /r]) + 1] , we let 

Bj =^ {x £ [N] : 2^-'^ -il/N < D*{x) < 2^ ■ t]/N} 

and we let Bo = {x £ [N] : D* (x) < rj/N}. Let h = [log(A^/?7) + 1] + 1 denote the number of 
buckets. 

We further define *== {j : D*{Bj) > rj/b} to denote the set of indices of "heavy" buckets, 
and let *== {j : D*{Bj) < i]/b} denote the set of indices of "light" buckets. Note that we have 

^ Z)*(i?,)<2,?. (20) 
ieJ^u{0} 

The query complexity of the algorithm is dominated by the number of PCOND/) queries per- 
formed in the executions of Compare, which by Lemma [2] is upper bounded by 



e4 



We argue completeness and soundness below. 
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Algorithm 4: PCOND^i-Test-Known 
Input: error parameter e > 0; query access to PCOND/j oracle; explicit description 
{D* {!),..., D*{N)) of distribution D* 
1: Call the SAMP/) oracle m = 0(6^(log b)/if') times to obtain points hi, ... , hm distributed 

according to D. 
2: for J = to 6 do 

3: Let D{Bj) be the fraction of points hi, ... , hm. that lie in Bj. 
4: if some j has \D*{Bj) - D{Bj)\ > ri/b then 
5: output REJECT and exit 
6: end if 
7: end for 

8: Select s = @{b/e) points xi, . . . ,Xs independently from D* . 

9: Call the SAMP^) oracle s = Q(b/e) times to obtain points yi,...,ys distributed according to 
D. 

10: for all pairs {xi,yj) (where 1 < j < s) such that £ [1/2)2] do 

11: Call COMPARE({x},{2/},ry/(46),2, l/(10s2)) 

12: if Compare returns Low or a value smaller than (1 — 77/(26)) • jjJ^ then 

13: output REJECT (and exit) 

14: end if 

15: end for 

16: output ACCEPT 



Completeness: Suppose that D = D*. Since the expected value of D(Bj) (defined in Line[3l) is 
precisely D*{Bj), for any fixed value of j G {0, . . . , [log(A^/r7) + 1]} an additive Chernoff bound 

implies that D*{Bj) — D(Bj) >ri/b with failure probability at most 1/(106). By a union bound 

over all b values of j, the algorithm outputs REJECT in Line [5] with probability at most 1/10. 
Later in the algorithm, since D = D* , no matter what points Xi,yj are sampled from D* and D 
respectively, the following holds for each pair (xj, yj) such that D*[x)/D*{y) G [1/2, 2]. By Lemma[2] 
(and the setting of the parameters in the calls to Compare), the probability that Compare returns 
Low or a value smaller than {1-5 / {2b))-{D* [x) / D* {y)), is at most l/{l^s^). A union bound over all 
(at most s^) pairs {xi,yj) for which D*{x)/D*{y) G [1/2,2], gives that the probability of outputting 
REJECT in Line [13] is at most 1/10. Thus with overall probability at least 8/10 the algorithm 
outputs ACCEPT. 



Soundness: Now suppose that dT\{D,D*) > e; our goal is to show that the algorithm rejects 
with probability at least 2/3. Since the algorithm rejects if any estimate D{Bj) obtained in Line [3] 
deviates from D*{Bj) by more than ii]/b, we may assume that all these estimates are indeed 
±77/6-close to the values D*{Bj) as required. Moreover, by an additive Chernoff bound (as in 
the completeness analysis), we have that with overall failure probability at most 1/10, each j has 
\D[Bj) — D{Bj)\ < rj/b; we condition on this event going forth. Thus, for every < j < 6, 

D*{Bj) - 2i]/b < D{B,j) < D*{Bj) + 277/6 . (21) 

Recalling the definition of and Equation (j20p . we see that 
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Let 



D{Bj)<4r,. 

i6J«U{0} 

dj"^' \D*{x)-D{x)\, 

x&Bj 



SO that I ID* — D\\i = dj. By Equations ((20]) and (f22|) . we have 

5^ < 5] iD*{B,)+D{B,))<6rj. 
ieJ^u{o} j£j'u{o} 

Since we have (by assumption) that \\D* — D\\i = 2dT^y{D*,D) > 2e, we get that 

dj > 2e - 6r/ . 



(22) 
(23) 

(24) 
(25) 



Let Nj '= \Bj\ and observe that Nj < D* {Bj)/pj < l/pj, where pj '= 2^ ^ • rj/N is the lower 
bound on the probabiUty (under D*) of ah elements in Bj. For each Bj such that j ^ J^\ {0}, let 

Hj =^ {x G Bj : D{x) > D*{x)} and Lj =^ {x G Bj : D{x) < D*{x)}. Similarly to the "testing 
uniformity" analysis, we have that 



E 



{D*{x) - D{x)) + Y iD{x) - D*{x)) = dj 



(26) 



Equation (j2ip may be rewritten as 



YiD*{x)-D{x))- YiD{x)-D*{x)) 



(27) 



and so we have both 

Y(.D*{x) - D{x))>dj/2-7]/b and ^ (L'(x) - L'*(x)) > (ij/2 - . (28) 



xGLi 



xeHi 



Also similarly to what we had before, let Hj '= {x G Bj : D{x) > D*{x) + 7]/{bNj)}, and 

L'j = {x G Bj : D{x) < D*{x) - rj/ibNj)} (recah that Nj = \Bj\); these are the element of Bj that 
are "significantly heavier" (lighter, respectively) under D than under D* . We have 

Y {D*{x) - D{x)) < r]/b and ^ {D{x) - D*{x)) < T]/b . (29) 

x£Lj\L'. x(iHj\H'. 

By Equation (|25p . there exists j* G J'*\{0} for which dj* > {2e — 6r])/b. For this index, applying 



(30) 



Equations (p8|) and (p9|) . we get that 



Y D*{x) > Y (D*{x) - D{x)) > (e - 5r?)/6 
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and similarly, 

J2 ^(^) ^ E (^(^) - ^*(^)) ^ - 5^)/^ • (31) 

Recalling that rj = e/6, we have that (e — 5r])/b = e/66. Now since s = Q{b/e), with probability 
at least 9/10 it is the case both that some Xj drawn in Line [8] belongs to L'j* and that some yj/ 
drawn in Line [9] belongs to Hj,. By the definitions of L'j* and Hj* and the fact for each j > it 
holds that Nj < 1/pj and pj < D*{x) < 2pj for each x £ Bj, we have that 

D{xi) < D*{xi) - ii/{bNj,) < D*{xi) - {v/b)pj* < (1 - r]/{2b))D* (x,) (32) 

and 

D{y,,) > D*{y,,) + r,/{hNj,) > D*{y,,) + {r]/b)pj > (1 + r] / {2b)) D* (y,,) . (33) 

TtiGrcforG 

D{xi) ^ l-v/{2b) D*{x,) ^ A _ 37?\ D*{xi) ^ ^^^^ 



D{yi,) l + r//(26) D*{yi,) \ Ab J D*{yi,) 
By LemmaO with probability at least 1 — l/(10s^), the output of Compare is either Low or is at 
most ^1 — ■(-'-"'"3b) ^ ~ 2b)^ causing the algorithm to reject. Thus the overall probability 
that the algorithm outputs REJECT is at least 8/10 — l/(10s^) > 2/3, and the theorem is proved. 



5.2 A (logiV)^(i) lower bound for PCOND^ 

In this subsection we prove that any PCOND/j algorithm for testing equivalence to a known distri- 
bution must have query complexity at least (log A^)^*^^^: 

Theorem 7 Fix e = 1/2. There is a distribution D* over \N] (described below), which is such 
that any PCOND/j algorithm for testing whether D = D* versus dTY{D,D*) > e must make 

^(VibSIiv) queries. 

The distribution D*. Fix parameters r = Q ^ i^'gip^jv ) ^^'^ ^ ~ ^i^ogN). We partition [N] 
from left (1) to right (iV) into 2r consecutive intervals Bi,. . . ,B2r, which we henceforth refer to 
as "buckets." The i-th bucket has \Bi\ = (we may assume without loss of generality that N 
is of the form X^[=ii^*). The distribution D* assigns equal probability weight to each bucket, so 
D*{Bi) = l/(2r) for all 1 < i < 2r. Moreover D* is uniform within each bucket, so for all j G Bi 
we have D*{j) = l/{2rK^). This completes the specification of D* . 

To prove the lower bound we construct a probability distribution Pno over possible "No"- 
distributions. To define the distribution it will be useful to have the notion of a "bucket-pair." 
A bucket-pair Ui is Ui = B2i~i U B2i, i.e. the union of the i-th pair of consecutive buckets. 

A distribution D drawn from "Pno is obtained by selecting a string vr = (vri, . . . ,TTr) uniformly 
at random from {ititi}^ setting D to be D^^, which we now define. The distribution D,^ is 
obtained by perturbing D* in the following way: for each bucket-pair Ui = (B2i-i, B2i), 



24 



• If TTj =t4 then the weight of -621-1 is uniformly "scaled up" from l/(2r) to 3/(4r) (keeping 
the distribution uniform within i?2j-i) and the weight of B2i is uniformly "scaled down" from 
l/(2r) to l/(4r) (likewise keeping the distribution uniform within i?2i)- 

• If TTj =4,t then the weight of -621-1 is uniformly "scaled down" from l/(2r) to l/(4r) and the 
weight of B2i is uniformly "scaled up" from l/(2r) to 3/(4r). 



Note that for any distribution D in the support of "Pno and any 1 < i < r we have that 
D{U^) = D*{Ui) = l/r. 

Every distribution D in the support of "Pno has dTy{D*,D) = 1/2. Thus Theorem [7] follows 
immediately from the following: 



Theorem 8 Let A be any (possibly adaptive) algorithm, which makes at most q < ^ ■ \fr calls to 
PCONDz). Then 



^PCONDo ^^fp^^g accept! - Pr U^^O^^o* outputs ACCEPT 



< 1/5. (35) 



Note that in the first probability of Equation (j35p the randomness is over the draw of D from Pno, 
the internal randomness of A in selecting its query sets, and the randomness of the responses to the 
PCOND/) queries. In the second probability the randomness is just over the internal coin tosses of 
A and the randomness of the responses to the PCOND/j queries. 

Intuition for Theorem [8l A very high-level intuition for the lower bound is that PCOND/) queries 
are only useful for "comparing" points whose probabilities are within a reasonable multiplicative 
ratio of each other. But D* and every distribution D in the support of Pno are such that every 
two points either have the same probability mass under all of these distributions (so a PCOND/) 
query is not informative), or else the ratio of their probabilities is so skewed that a small number 
of PCOND/3 queries is not useful for comparing them. 

In more detail, we may suppose without loss of generality that in every possible execution, 
algorithm A first makes q calls to SAMP/) and then makes q (possibly adaptive) calls to PCOND/). 
The more detailed intuition for the lower bound is as follows: First consider the SAMP/j calls. Since 
every possible D (whether D* or a distribution drawn from Pno) puts weight 1/r on each bucket- 
pair Ui, . . . ,Ur, a birthday paradox argument implies that in both scenarios, with probability at 
least 9/10 (over the randomness in the responses to the SAMP/3 queries) no two of the q < ^\/r 
calls to SAMP/) return points from the same bucket-pair. Conditioned on this, the distribution 
of responses to the SAMP/) queries is exactly the same under D* and under D where D is drawn 
randomly from "Pno- 

For the pair queries, the intuition is that in either setting (whether the distribution D is D* 
or a randomly chosen distribution from Pno)i making q pair queries will with 1 — o(l) probability 
provide no information that the tester could not simulate for itself. This is because any pair query 
PCOND/)({x, y}) either has x,y in the same bucket Bi or in different buckets Bi 7^ Bj with i < j. 
If x,y are both in the same bucket Bi then in either setting PCOND/)({x, y}) is equally likely to 
return x or y. If they belong to buckets Bi,Bj with i < j then in either setting PC0ND/)({2;, y}) 
will return the one that belongs to Pi with probability 1 — 1/Q{K^~^) > 1 — 1/Q{K). 
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Proof of Theorem [8} As described above, we may fix A to be any PCOND/j algorithm that 
makes exactly q calls to SAMP/j followed by exactly q adaptive calls to PCOND/j. 

A transcript for A is a full specification of the sequence of interactions that A has with the 
PCOND/) oracle in a given execution. More precisely, it is a pair (Y, Z) where Y = (si, . . . , Sq) € 
[A^]'' and Z = {{{xi,yi},pi), . . . , {{xg,yg},pg)), where £ {xi,yi} and Xi,yi £ [N]. The idea is 
that y is a possible sequence of responses that A might receive to the initial q SAMP^ queries, 
{xi,yi} is a possible pair that could be the input to an i-th PCOND/j query, and pi is a possible 
response that could be received from that query. 

We say that a length-i transcript prefix is a pair (Y, Z^) where Y is as above and = 
(({xi, . . . , {{xi,yi\,pi)). A PCOND algorithm A may be viewed as a collection of distribu- 

tions over pairs {x, y} in the following way: for each length-i transcript-prefix {Y, Z*) (0 < i < q — 1), 
there is a distribution over pairs yj+i} that A would use to select the {i + l)-st query pair for 

PCOND/) given that the length-i transcript prefix of ^'s execution thus far was {Y^ Z*). We write 
T^Y,Z') to denote this distribution over pairs. 

Let P* denote the distribution over transcripts induced by running A with oracle PCOND/j*. 
Let P'^" denote the distribution over transcripts induced by first (i) drawing D from Pno) and 
then (ii) running A with oracle PCOND/j. To prove Theorem [8] it is sufficient to prove that the 
distribution over transcripts of A is statistically close whether the oracle is D* or is a random D 
drawn from P^°, i.e. it is sufficient to prove that 

dTv(P*,P^°) < 1/5. (36) 

For our analysis we will need to consider variants of algorithm A that, rather than making q 
calls to PCOND/), instead "fake" the final q — k oi these PCOND/j queries as described below. For 
< A; < g we define A^^^ to be the algorithm that works as follows: 

1. A^^^ exactly simulates the execution of A in making an initial q SAMP/) calls and making the 
first k PCOND/) queries precisely like A. Let (y, Z^') be the length-A; transcript prefix of ^'s 
execution thus obtained. 

2. Exactly like A, algorithm A^^'^ draws a pair {xk+i^yk+i} from T^y^z'')- However, instead of 
calling PCOND/)({j;^.+i, yfc+i}) to obtain Pk+i, algorithm A^^^ generates Pk+i in the following 
manner: 

(i) If Xk+i and yk+i both belong to the same bucket then pk+i is chosen uniformly from 
{xk+i-,yk+i]- 

(ii) If one of {x^+i, yk+i} belongs to and the other belongs to B^i for some £ < I' , then 
Pk-\-i is set to be the element of {xk+i,yk+i\ that belongs to B^. 

Let {Y,Z^^^) be the length-(A: + 1) transcript prefix obtained by appending 
({xfc+i, yfc_|_i},pfc_i_i) to Z^. Algorithm A' continues in this way for a total of q — k stages; 
i.e. it next draws {xk+2,yk+2} from T^y^^^+i) and generates Pk+2 as described above; then 
(y, Z*^"*"^) is the length-(fc -|- 2) transcript prefix obtained by appending {{xk+2,yk+2},Pk+2) 
to Z^~^^; and so on. At the end of the process a transcript (Y, Z'^) has been constructed. 
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Let P*'^^) denote the distribution over final transcripts {Y, Z'') that are obtained by running 
A^^^ on a PCOND/:)* oracle. Let p^"'^^) denote the distribution over final transcripts (Y, Z'^) that 
are obtained by (i) first drawing D from Pno and then (ii) running on a PCOND^) oracle. 
Note that P*'^'?) is identical to P* and pNo.C?) is identical to P^° (since algorithm A'^i\ which does 
not fake any queries, is identical to algorithm A). 

Recah that our goal is to prove Equation ([36]) • Since P*'^^) = P* and P^°'(i) = P^°, Equa- 
tion (|36p is an immediate consequence (using the triangle inequality for total variation distance) of 
the following two lemmas, which we prove below: 

Lemma 5 dTv(P*'^°\ P^°'^°^) < 1/10. 

Lemma 6 ForallO <k<q, we have dTv(P*'^''\P*'^'''^'^^) < l/(20g) and dT:y(P^°'(''\P^°^^^+^')) < 
l/(20g). 

Proof of Lemma [5} Define Pq to be the distribution over outcomes of the q calls to SAMP^) (i.e. 
over length-0 transcript prefixes) when D = D* . Define P'^" to be the distribution over outcomes of 
the q calls to SAMP/) when D is drawn from "Pno- We begin by noting that by the data processing 
inequality for total variation distance, we have (iTv(P*'^°'') P'^°'^°'') < '^Tv(Po)Po°) (indeed, after 
the calls to respectively SAMP/) and SAMP/j*, the same randomized function F - which fakes all 
remaining oracle calls - is applied to the two resulting distributions over length-0 transcript prefixes 
P^ and P^°). In the rest of the proof we show that dTv(Po, Po°) < 1/10- 

Let E denote the event that the q calls to SAMP/) yield points si, . . . ,Sq such that no bucket-pair 
Ui contains more than one of these points. Since D*[Ui) = 1/r for all i, 

Pl{E) = Y[{l-l\ >9/10, (37) 
i=o ^ ^ 

where Equation (j37p follows from a standard birthday paradox analysis and the fact that q < ^\/r. 
Since for each possible outcome of D drawn from V^o we have D{Ui) = 1/r for all i, we further 
have that also ^ 

P^^iE) = l[(l-lY (38) 

We moreover claim that the two conditional distributions (Pq|-E) and (Pq°|£^) are identical, i.e. 

{Pl\E) = (P^lii;). (39) 

To see this, fix any sequence (£i, . . . , £g) G [r]'^ such that li ^ ij for all i ^ j. Let (si, . . . , Sg) S [A^]"^ 
denote a draw from (Pq\E), The probability that (sj £ Ug. for all 1 < z < g') is precisely 1/r'^. Now 
given that Si € Ug. for all i, it is clear that Si is equally likely to lie in i?2^i-i and in B2e^, and 
given that it lies in a particular one of the two buckets, it is equally likely to be any element in 
that bucket. This is true independently for all 1 < i < g. 

Now let (si,...,Sg) G [N]i denote a draw from (Po°|£^)- Since each distribution D in the 
support of "Pno has D[Ui) = 1/r for all i, we likewise have that the probability that (sj G Ug. for 
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all 1 < i < q) is precisely 1/r''. Now given that Si G Ui- for all i, we have that Sj is equally likely 
to lie in i?2£i-i and in -62^^; this is because tTj (recall that vr determines D = D-,^) is equally likely 
to be ti (in which case Z)(i?24-i) = 3/(4r) and D{B2£j = l/(4r)) as it is to be 4,t (in which case 
L'(i?24-i) = 1/(4^') and D{B2£i) = 3/(4r)). Additionally, given that Sj lies in a particular one of 
the two buckets, it is equally likely to be any element in that bucket. This is true independently 
for all 1 < i < g (because conditioning on E ensures that no two elements of si, . . . , Sg lie in the 
same bucket-pair, so there is "fresh randomness for each i"), and so indeed the two conditional 
distributions (P^l^) and (P^°|^) are identical. 

Finally, the claimed bound (iTv(PS)Po°) ^ 1/10 follows directly from Equations §7^, §8^ 
and ([39]). ■ 

Proof of Lemma M Consider first the claim that dTv(P*'^''\ P*'^''"^^^) < l/(20g). Fix 
any < k < q. The data processing inequality for total variation distance implies that 
(iTv(P*'^^^) P*'^^^"*^^) is at most the variation distance between random variables X and Y, where 

• X is the random variable obtained by running A on COND/)* to obtain a length-A; transcript 
prefix (y, Z^), then drawing {xk+i,yk+i} from T^y^^k), then setting Pk+i to be the output of 
PCOND£).({a;fc+i,yfe+i); and 

• y is the random variable obtained by running A on COND/j* to obtain a length-A; transcript 
prefix (y, Z'^), then drawing {a^fc+ii^fc+i} from T^y^^k), then setting Pk+i according to the 
rules 2(i) and 2(ii) given above. 

Consider any fixed outcome of {Y, Z^) and {xk+i,yk+i}- If rule 2(i) is applied {x^+i and 
yk+i are in the same bucket) then there is zero contribution to the variation distance be- 
tween X and y, because choosing a uniform element of {xk+i,yk+i} is a perfect simulation of 
PCOND£){{xk+i,yk+i})- If rule 2(ii) is applied then the contribution is at most 0(1/ K) < l/20q, 
because PCOND£)*({xk+iyk+i}) would return a different outcome from rule 2(ii) with probability 
l/e{K^'-^) = 0{l/K). Avera ging over all possible outcomes of (Y,Z^) and {xk+i^yk+i} we get 
that the variation distance between X and Y is at most l/20g as claimed. 

An identical argument shows that similarly dTv(P^°'^*'^ P^"'^'^"''"^^) < l/(20g). The key obser- 
vation is that for any distribution D in the support of P^°, as with D* it is the case that points 
in the same bucket have equal probability under D and a point y that \s I' — I buckets lower than 
X has probability only \/Q{K^ of being returned by a call to PCOND£)({x, y}). This concludes 
the proof of Lemma [6] and of Theorem [71 I 

5.3 A poly(l/e)-query COND/j algorithm 

In this subsection we present an algorithm COND-Test-Known and prove the following theorem: 

Theorem 9 COND-Test-Known is a 0{l/e'^)- query COND/) testing algorithm for testing equiv- 
alence to a known distribution D* . That is, for every pair of distributions D, D* over [N] (such that 
D* is fully specified and there is COND query access to D), the algorithm outputs ACCEPT with prob- 
ability at least 2/3 if D = D* and outputs REJECT with probability at least 2/3 if dT;'Y{D, D*) > e. 
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This constant-query testing algorithm stands in interesting contrast to the (log A^)^^^^-query 
lower bound for PCOND/j algorithms for this problem. 

High-level overview of the algorithm and its analysis: First, we note that by reordering 
elements of [A^] we may assume without loss of generality that D*{1) < • • • < D*{N); this will be 
convenient for us. 

Our (log-/V)^(^) query lower bound for PCOND/j algorithms exploited the intuition that com- 
paring two points using the PCOND/j oracle might not provide much information (e.g. if one of the 
two points was a priori "known" to be much heavier than the other). In contrast, with a general 
COND/) oracle at our disposal, we can compare a given point j G [N] with any subset of [N] \ {j}. 
Thus the following definition will be useful: 

Definition 3 (comparable points) Fix < A < 1. A point j G supp(D*) is said to be X- 
comparable if there exists a set S C ([A^] \ {j}) such that 

D*{j) G [XD*{S),D*iS)/X]. 

Such a set S is then said to be a A-comparable-witness for j (according to D*), which is denoted 
S =* j. We say that a set T C [N] is X-comparable if every i £ T is X- comparable. 

We stress that the notion of being A-comparable deals only with the known distribution D*] this 
will be important later. 

Fix ei = 0(e) (we specify ei precisely in Equation 1421 below). Our analysis and algorithm 
consider two possible cases for the distribution D* (where it is not hard to verify, and we provide 
an explanation subsequently, that one of the two cases must hold): 

1. The first case is that for some i* G [N] we have 

2)*({l,...,i*}) >2ei but D*({l,...,r-1}) <ei. (40) 

Li this case 1 — ei of the total probability mass of D* must lie on a set of at most 1/ei elements, 
and in such a situation it is easy to efficiently test whether D = D* using poly(l/e) queries 
(see Algorithm COND/j-Test-Known-Heavy and Lemma [9]). 

2. The second case is that there exists an element k* G [A^] such that 

ei < D*{{1, k*}) < 2ei < D*{{1, ...,k* + 1}). (41) 

This is the more challenging (and typical) case. In this case, it can be shown that every 
element j > k* has at least one ei-comparable-witness within {1, . . . , j}. In fact, we show 
(see Claim [7]) that either (a) {1, . . . ,j — 1} is an ei -comparable witness for j, or (b) the set 
{1, ... ,j — 1} can be partitioned into disjoint setqj ^i, ■ ■ ■ , St such that each Si, 1 < i < t, is 
a ^-comparable- witness for j. Case (a) is relatively easy to handle so we focus on (b) in our 
informal description below. 

'in fact the sets are intervals (under the assumption D*{1) < ■ ■ ■ < D*{n)), but that is not really important for 
our arguments. 
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The partition Si, . . . , St is useful to us for the following reason: Suppose that dTy{D, D*) > e. 
It is not difficult to show (see Claim [8]) that unless D{{1, . . . ,k*}) > 3ei (which can be easily 
detected and provides evidence that the tester should reject), a random sample of 0(l/e) draws 
from D will with high probability contain a "heavy" point j > k* , that is, a point j > k* such that 
D{j) > (1 + €2)0* (j) (where 62 = 0(e)) • Given such a point j, there are two possibilities: 



1. The first possibility is that a significant fraction of the sets 5i, ■ ■ ■ ,St have D{j) / D{Si) "no- 
ticeably different" from D* (j) / D* (Si) . (Observe that since each set Si is a ^-comparable 
witness for j, it is possible to efficiently check whether this is the case.) If this is the case 
then our tester should reject since this is evidence that D ^ D* . 

2. The second possibility is that almost every Si has D{j)/ D{Si) very close to D*{j)/ D* (Si). 
If this is the case, though, then since D{j) > (1 -|- e2)D*{j) and the union of Si,..., St 
is — 1}, it must be the case that D{{l,...,j}) is "significantly larger" than 
D*{{1, . . . This will be revealed by random sampling from D and thus our testing algo- 
rithm can reject in this case as well. 



Key quantities and useful claims. We define some quantities that are used in the algorithm 
and its analysis. Let 

def e dcf e def £ def C 

ei = -, e2 = -; e, = -; = -. (42) 

Claim 7 Suppose there exists an element k* G [A^] that satisfies Equation -^^^ o,ny j > k* . 

Then 

1. If D*{j) > ei, then Si '= {1, . . . ,j — 1} is an ei- comparable witness for j; 

2. If D*{j) < ei then the set {1, ... ,j — 1} can be partitioned into disjoint sets Si, . . . ,St such 
that each Si, 1 < i < t, is a ^-comparable- witness for j. 

Proof: First consider the case that D*{j) > ei. In this case Si = {1, . . . ,j — 1} is an ei-comparable 
witness for j because D*{j) > ei > eiD*{{l, . . . ,j - 1}) and D*{j) < 1 < j^D*{{l, . . . ,k*}) < 
■^D*{{1, . . . ,j — 1}), where the last inequality holds since k* < j — 1. 

Next, consider the case that D*{j) < ei. In this case we build our intervals iteratively from 
right to left, as follows. Let ji = j — 1 and let j2 be the minimum index in {0, . . . , ji — 1} such that 

D*{{j2 + l,...,ji})<D*{j). 

(Observe that we must have j2 > 1, because D*{{1, . . . ,k*}) > ei > D*{j).) Since 
D*{{j2, ■ ■ ■ > D*{j) and the function D*{-) is monotonically increasing, it must be the case 

that 

^D*U)<D*{{j2 + l,...,ji})<D*U). 
Thus the interval 5*1 *== {j2 -|- 1, . . . , ji} is a i-comparable witness for j as desired. 
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We continue in this fashion from right to left; i.e. if we have defined j2, ■ ■ ■ ,jt as above and 
there is an index / £ {0, . . . , jt — 1} such that D*{{j' + 1, . . . , jj}) > D*{j), then we define jt+i to 
be the minimum index in {0, . . . , jt — 1} such that 

D*{{jt+i + l,...,jt})<D*{j), 

and we define St to be the interval {jt+i + 1, . . . ,jt}- The argument of the previous paragraph tells 
us that 

lD*ij) < D*i{jt+, + 1, . . . , jj) < D*{j) (43) 
and hence St is an ^-comparable witness for j. 

At some point, after intervals 5i = {j2 + 1, . . . , ji}, . . . , St = {jt+i + ^,---,jt} have been 
defined in this way, it will be the case that there is no index / G {0, — 1} such that 

D*{{j' + 1, . . . ,jt}) > D*{j). At this point there are two possibilities: first, if jt+i + 1 = 1, then 
Si, . . . ,St give the desired partition of {1, ... ,j — 1}. If jt+i + 1 > 1 then it must be the case that 
D*{{1, . . . ,jt+i}) < D*{j). In this case we simply add the elements {1, . . . , jt+i} to St, i.e. we 
redefine St to be {1, . . . ,jt}- By Equation (j43p we have that 

^D*ij)<D*iSt)<2D*ij) 

and thus St is an ^-comparable witness for j as desired. This concludes the proof. H 

Definition 4 (Heavy points) A point j € supp(D*) is said to be jy- heavy if 
D{j) > {l + r,)D*{j). 

Claims Suppose that dTv{D,D*) > e and Equation holds. Suppose moreover that 

D{{\, . . . , fc*}) < 4ei. Let ii, . . . ,i£ be i.i.d. points drawn from D. Then for I = G(l/e), with prob- 
ability at least 99/100 (over the i.i.d. draws o/ii, . . . ~ D) there is some point ij £ {ii, . . . 
such that ij > k* and ij is €2-heavy. 

Proof: Define Hi to be the set of all e2-heavy points and H2 to be the set of all "slightly lighter" 
points as follows: 

Hi = {i£[N]\D{i)>{l + e2)D*{i)} 

H2 = {ie[N]\{l + e2)D*{i) > D{i) > D*{i) } 

By definition of the total variation distance, we have 

e<dTv{D,D*) = Yl {D{i)-D*{{)) = {D{Hi)-D*{Hi)) + {D{H2)-D*{H2)) 

v.D{i)>D*(i) 

< D{Hi) + {{l + e2)D*{H2)-D*{H2)) 

= D{Hi) + e2D*{H2) < D{Hi) + £2 = D{Hi) + ^. 

So it must be the case that D[Hi) > e/2 = 5ei. Since by assumption we have D{{\, . . . , fc*}) < 4ei, 
it must be the case that D{Hi \ {1, . . . , A;*}) > ei. The claim follows from the definition of Hi and 
the size, of the sample. H 
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Algorithm 5: COND£)-Test-Known 
Input: error parameter e > 0; query access to COND/) oracle; explicit description 
(L>*(1), . . .,D*{N)) of distribution D* satisfying D*{1) < ■ ■ ■ < D*{N) 
1: Let i* be the minimum index i G [A^] such that D*{{1, . . . ,i}) > 2ei. 
2: if 1)* ({!,... ,i* - 1}) < ei then 

3: Call algorithm CONDB-Test-Known-Heavy(e, COND/), D*,i*) (and exit) 
4: else 

5: CaU algorithm CONDD-Test-Known-Main(e, CONDd, D*,i* - 1) (and exit). 
6: end if 



Algorithm 6: COND£)-Test-Known-Heavy 
Input: error parameter e > 0; query access to COND/) oracle; explicit description 

(£>*(!), . . .,D*{N)) of distribution D* satisfying D*{1) < ■ ■ ■ < D*{N); value i* G [N] 

satisfying D*{{1, . . . ,i* - 1}) < e^, D*{{1, i*}) > 2ei 
1: Call the SAMPd oracle m = e((log(l/e))/e'^) times. For each i £ [i*,N] let D{j) be the 

fraction of the m calls to SAMP/j that returned i. Let D' = 1 — N] -^(0 fraction 

of the m calls that returned values in {!,... ,i* — 1}. 
2: if either (any i G [i*,N] has \D{i) - D*{i)\ > ei^) or {D' - D*{{1, ...,i*-l})> ei) then 
3: output REJECT (and exit) 
4: end if 

5: Output ACCEPT 



5.3.1 Proof of Theorem [6] 

It is straightforward to verify that the query complexity of COND/)-Test-Known-Heavy is 0{l/e^) 
and the query complexity of COND/j-Test-Known-Main is also 0(l/e'*), so the overall query com- 
plexity of COND-Test-Known is as claimed. 

By the definition of i* (in the first line of the algorithm), either Equation (j40p holds for this 
setting of i* , or Equation (j4ip holds for k* = i* — 1. To prove correctness of the algorithm, we first 
deal with the simpler case, which is that Equation (I40p holds: 

Lemma 9 Suppose that D* is such that D*{{1, . . . , i*}) > 2ei but D*{{1, . . . ,i* - 1}) < ei. Then 
C0NDz)-TEST-KNOWN-HEAVY(e, CONDz),D*,z*) returns ACCEPT with probability at least 2/3 if 
D = D* and returns REJECT with probability at least 2/3 if dT^Y{D,D*) > e. 

Proof: The conditions of LemmalU together with the fact that D*{-) is monotone non-decreasing, 
imply that each i > i* has D*{i) > ei. Thus there can be at most 1/ei many values i G {i*, . . . , A^}, 
i.e. it must be the case that i* > N — 1/ei + 1. Since the expected value of D{i) (defined in 
LinedJof COND^-Test-Known-Heavy) is precisely D{i), for any fixed value of i G {«*,..., n} an 
additive Chernoff bound implies that \D{i) — D(i)\ < (ei)^ with failure probability at most — / ^ i\ - 

Similarly \D' — -D({1, . . . ,i* — 1})| < ei with failure probability at most — f ^ i \ - ^ union bound 



32 



Algorithm 7: CONDd-Test-Known-Main 
Input: error parameter e > 0; query access to CONDi) oracle; explicit description 

(Z)*(l), . . .,D*{N)) of distribution D* satisfying D*{1) < ■ ■ ■ < L»*(iV); value k* G [A^] 
satisfying a < D*{{1, . . . ,k*}) < 2ei < D*{{1, ^. . ,k* + I}) 
1: Call the SAMP/) oracle times and let D{{1, . . . , k*}) denote the fraction of responses 

that lie in {1, ... , k*}. If D{{1, k*}) ^ , ^] then output REJECT (and exit). 
2: Call the SAMP/) oracle i = G(l/e) times to obtain points ii, . . . 
3: for all j e {1, . . . ,£} such that ij > k* do 

4: Call the SAMP/5 oracle m = 0(log(l/e)/e) times and let D{{1, . . . ,ij}) be the fraction of 
responses that lie in {1, ... , ij}. If D{{1, . . . , ij}) ^ [1 — es, 1 + e3]D*({l, . . . , ij}) then 
output REJECT (and exit). 

5: ii D*{ij) > ei then 

6: Run COMPARE({ij}, {!,..., ij — 1}, ^, jgj) and let v denote its output. If 

v^[l-f,l + f]^^%0jy^ then output REJECT (and exit). 
7: else 

8: Let Si, . . . ,St be the partition of {1, . . . ,ij — 1} such that each Si is an ei-comparable 

witness for ij, which is provided by Claim [71 
9: Select a list of /i = B(l/e) elements S'aj, . . . , Sa^ independently and uniformly from 

{Si, . . . , Sj}. 

10: For each Sa,., I < r < h, run COMPARE{{i j}, Sa^, ^,4, j^) and let v denote its output. 

If v ^ [1 - f , 1 + f ] t^en output REJECT (and exit). 

11: end if 
12: end for 
13: Output ACCEPT. 
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over all failure events gives that with probability at least 9/10 each value i G {i*,...,N} has 
\D{i) — D{i)\ < ei^ and additionally \D' — D{{1, . . . ,i* — 1})| < ei; we refer to this compound event 
as (*). 

If D* = D, by (*) the algorithm outputs ACCEPT with probability at least 9/10. 

Now suppose that d-Y\{D^D*) > e. With probability at least 9/10 we have (*) so we suppose 
that indeed (*) holds. In this case we have 



e<dryiD,D*) = Y^W) - D*{i)\ + - D*ii)\ 

i<i* i>i* 

< Y,mi) + D*{i)) + Y^\D{i)-D*{i)\ 

i<i* i>i* 

< D{{1, -l}) + ei + Y, (l^W - + 

i>i* 

< D' + €i + 2€i + Y(\D{i)-D*{i 



i>i* 



where the first inequality is by the triangle inequality, the second is by (*) and the fact that 
D*{{1, . . . ,i* — 1}) < ei, and the third inequality is by (*) and the fact that there are at most 1/ei 
elements in {i* , . . . ,N}. Since ei = e/10, the above inequality implies that 



l-e<D' + Y(\D{^)-D*^ 



10 

If any i G {i* , . . . ,N} has \D{i) — D*{i)\ > (ei)^ then the algorithm outputs REJECT so we may 
assume that \D{i) — D*{i)\ < ex' for all i. This implies that 

6ei = —e<D' 

10 - 

but since I?*({1, . . . , i* — 1}) < ei the algorithm must REJECT. ■ 

Now we turn to the more difficult (and typical) case, that Equation ()4ip holds (for k* = i* — 1), 

i.e. 

ei <Z)*({1,...,F}) <2ei <I)*({1,...,F + 1}). 
With the claims we have already established it is straightforward to argue completeness: 

Lemma 10 Suppose that D = D* and Equation holds. Then with probability at least 2/3 
algorithm COND/j-Test-Known-Main outputs ACCEPT. 

Proof: We first observe that the expected value of the quantity D({1, . . . ,k*}) defined in Line[T] 
is precisely D{{1, . . . , k*}) = D*{{1, . . . ,k*}) and hence lies in [ei,2ei] by Equation (|iT]) . The 
additive Chernoff bound implies that the probability the algorithm outputs REJECT in Line [1] is 
at most 1/10. Thus we may assume the algorithm continues to Line[2j 
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In any given execution of Line HI since the expected value of D{{1, . . . ,ij}) is precisely 
D{{1, . . . ,ij}) = D*{{1, . . . ,ij}) > ei, a multiplicative Chernoff bound gives that the algorithm 
outputs REJECT with probability at most 1/(10^). Thus the probability that the algorithm out- 
puts REJECT in any execution of Line H] is at most 1/10. We henceforth assume that the algorithm 
never outputs REJECT in this step. 

Fix a setting of j £ {1, ■ ■ ■ ,i} such that ij > k* . Consider first the case that D*{ij) > ei so 
the algorithm enters Line[6l By item (1) of Claim [7] and item (1) of Lemma O we have that with 
probability at least 1 — ]^ Compare outputs a value v in the range [1 — 1 + f|] ^ }) 
(recall that D = D*), so the algorithm does not output REJECT in Line [6l Now suppose that 
D*{ij) < ei so the algorithm enters Line El Fix a value 1 < r < /i in Line[ini By Claim [7] we have 
that Sa^ is a ^-comparable witness for ij. By item (1) of Lemma[2l we have that with probability at 

least 1 - Compare outputs a value v in the range [1-^,1 + x] D*{{i'}) (^^^^^^ ^ = ^*)- 
A union bound over all h values of r gives that the algorithm outputs REJECT in Line 1101 with 
probability at most 1/(10£). So in either case, for this setting of j, the algorithm outputs REJECT 
on that iteration of the outer loop with probability at most 1/(10^). A union bound over all i 
iterations of the outer loop gives that the algorithm outputs REJECT at any execution of Line [6] or 
Line [TOl is at most 1/10. 

Thus the overall probability that the algorithm outputs REJECT is at most 3/10, and the lemma 
is proved. H 

Next we argue soundness: 

Lemma 11 Suppose that dTY{D,D*) > e and Equation ^4]^ holds. Then with probability at least 
2/3 algorithm COND/j-Test-Known-Main outputs REJECT. 

Proof: If D{{1, . . . ,k*}) ^ [ei,3ei] then a standard additive Chernoff bound implies that the 
algorithm outputs REJECT in Line [J with probability at least 9/10. Thus we may assume going 
forward in the argument that D({1, . . . , k*}) G [ei, 3ei]. As a result we may apply Claim[8l and we 
have that with probability at least 99/100 there is an element ij £ {ii, . . . ,ii} such that ij > k* 
and ij is e2-heavy, i.e. D{ij) > (1 + e2)D*{ij). We condition on this event going forward (the rest 
of our analysis will deal with this specific element ij). 

We now consider two cases: 

Case 1: Distribution D has ^({1, . . . ^ [1 — 3e3, 1 + 3e3]D*({l, . . . Since the quantity 

Z)({1, . . . , ij}) obtained in LinelUhas expected value -D({1, . . . , ij}) > -D({1, . . . , k*}) > ei, applying 
the multiplicative Chernoff bound implies that D{{1, . . . , ij}) £ [1 — 63, 1 + e3]D({l, . . . , ij}) except 
with failure probability at most e/10 < 1/10. If this failure event does not occur then since 
D{{l,...,ij}) i [1 - 363,1 + 3e3]£'*({l,...,ij}) it must hold that i)({l, . . . , i^}) ^ [1 - €3, 1 + 
e3]-D*({l, . . . and consequently the algorithm outputs REJECT. Thus in Case 1 the algorithm 
outputs REJECT with overall failure probability at least 89/100. 

Case 2: Distribution D has £'({1, . . . , ij}) G [1 - 3e3, 1 + 3e3]i:>*({l, . . . , ij}). This case is divided 
into two sub-cases depending on the value of D* {ij ) . 

Case 2(a): D*{ij) > ei. In this case the algorithm reaches Line[6j We use the following claim: 
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Claim 12 In Case 2(a), suppose that ij > k* is such that D[ij) > (1 + e2)D*[ij), and 
D{{l,...,ij]) G [l-3e3,l + 3e3]I)*({l,...,ij}). Then 



D{i,) -V 4; D*{i,) 

Proof: To simplify notation we write 

a^^Dii,)- h^^'D^i,)- c'i'z)({l,...,i,-l}); d"^' D* {{I, . . . - I}). 
We have that 

a>{l + t2)h and o + c < (1 + 3e3)(6 + d). (44) 

This gives 

c < (1 + 3e3)(6 + d) - (1 + e2)b = (1 + 3e3)(i + (3e3 - £2)6 < (1 + 3e3)d , (45) 

where in the last inequality we used £2 > 3e3. Recalling that a > (1 + 62)6 and using 63 = £2/24 we 
get 

c (l + 3e3)c; ^ d 1 + €2/8 d _ e2\ , . 

a (1 + 62)6 6 1 + 62 6 V 4/ ■ ^ ^ 

This proves the claim. I 

Applying Claim [T2l we get that in Line [6] we have 

Di{l,...,ij-l}) < _ £2\ D*{{h...,ij-l}) 

D{ij) -\ a) D*{ij) ■ ^ ^ 

Recalling that by the premise of this case D*{ij) > ei, by applying Claim [7] we have that 
— 1} is an ei-comparable witness for ij. Therefore, by Lemma [21 with probability at 
least 1 — the call to COMPARE({zj}, {1, . . . ,ij — 1}, j^, ^, ^) in Line [6] either outputs an ele- 
ment of {High, Low} or outputs a value v < {1 - f )(1 + < (1 - f)^^%^0y^- 
In either case the algorithm outputs REJECT in Line[6l so we are done with Case 2(a). 

Case 2(b): D*{ij) < ei. In this case the algorithm reaches Line 1101 ^iid by item 2 of Claim [71 
we have that Si, . . . , 5^ is a partition of {1, . . . , ij — 1} and each set Si, ... ,St is a ^-comparable 
witness for ij, i.e., 

foralHG{l,...,t}, ^D*ij) < D*iSi) < 2D*{j). (48) 
We use the following lemma: 

Claim 13 In Case 2(h) suppose ij > k* is such that D{ij) > (1 + e2)D*[ij) and D[{1, . . . ,ij}) € 
[1 — 3e3, 1 + 3e3]-D*({l, . . . , ij}). Then at least (64/ 8) -fraction of the sets Si, . . . ,St are such that 

DiS,) < (l + e4)Z)*(5,). 
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Proof: The proof is by contradiction. Let p = 1 — €4/8 and suppose that there are w sets 
(without loss of generahty we call them Si, . . . , Sw) that satisfy D{Si) > (1 + e4)D*{Si), where 
p' = Y > P- We first observe that the weight of the w subsets Si, . . . , Sw under D* , as a fraction of 
D*{{1, ... ,ij - 1}), is at least 

D*{SiU---USw) w _ P' 

D*{SiU---USw) + {t-w)-2D*{j) - ^£:M + (t-w)-2D*{j) ~ 4t - 3«; " 4-3p'' 

where we used the right inequality in Equation (f48l) on ... ,St to obtain the leftmost expression 

above, and the left inequality in Equation (jl8|) (together with the fact that is an increasing 
function of x for all c > 0) to obtain the inequality above. This implies that 

w t w t 

D{{l,...,i,-l]) = Y.D{S{)+ D{Si)>{l + e^)Y,D*{S,)+ ^ D{Si) 

i=l i=w+l i=i i=w+l 

^ (l + e4)j^i?*({l,...,ii-l}) 

- (l + ^4)j^I?*({l,...,b-l}). (49) 
From Equation (j49p we have 

D{{l,...,i,}) > {l + e,)j^^D*{{l,...,i,-l}) + {l + e2)D*[i,) 

> (1 + ^) . . . , ^, - 1}) + (1 + e2)D*{i,) 

where for the first inequality above we used D{ij) > (1 + €2)0* {ij) and for the second inequality 
we used (1 + £4)3^ > 1 + This imphes that 

D{{1, ij}) > (1 + ^) . . . , - 1}) + (1 + ^) D*{i,) = (1 + ^) • • • > ij}) 

where the inequality follows from £2 > Since ^ > 3e3, though, this is a contradiction and the 
claim is proved. I 

Applying Claim [T3l and recalling that h = 0(l/e) = 0(l/e4) sets are chosen randomly 
in Line El we have that with probability at least 9/10 there is some r G {!,..., /i} such that 
D{Sa,) < (1 + e4)-D*(5aJ. Combining this with D{ij) > (1 + e2)D*{ij), we get that 

D{ij) -1 + 62 D*{ij) -\ 2) D*{ij) ' 

By LemmaEl with probability at least 1 — the call to COMPARE({zj}, Sa^ ^^A, j^) '^^ LinefTOl 
either outputs an element of {High, Low } or outputs a value -y < (1 — y)(1 + ^) ^£)*fi')^ < (1 ~ 
In either case the algorithm outputs REJECT in Line IIOI so we are done in Case 2(b). 
This concludes the proof of soundness and the proof of Theorem [6l I 



37 



6 Testing equality between two unknown distributions 



6.1 An approach based on PCOND queries 

In this subsection we consider the problem of testing whether two unknown distributions Di , D2 
are identical versus e-far, given PCOND access to these distributions. Although this is known 
to require 0(iV^/'^) many samples in the standard model [BFR"'"10 IValllj . we are able to give 



a poly (log A^, l/e)-query algorithm using PCOND queries, by taking advantage of comparisons to 
perform some sort of clustering of the domain. 

On a high level the algorithm works as follows. First it obtains (with high probability) a small 
set of points R such that almost every element in [A^], except possibly for some negligible subset 
according to Di, has probability weight (under Di) close to some "representative" in R. Next, 
for each representative r in it obtains an estimate of the weight, according to Di, of a set of 
points U such that D\{u) is close to Di{r) for each u va. U (i.e, r's "neighborhood under -Di"). 
This is done using the procedure Estimate-Neighborhood from Subsection 13. 2p . Note that 
these neighborhoods can be interpreted roughly as a succinct cover of the support of Di into (not 
necessarily disjoint) sets of points, where within each set the points have similar weight (according 
to Di). Our algorithm is based on the observation that, if Di and D2 are far from each other, it 
must be the case that one of these sets, denoted U* , reflects it in one of the following ways: (1) 
D2{U*) differs significantly from Di{U*); (2) U* contains a subset of points V* such that D2{v) 
differs significantly from D2{r) for each v in V* , and either DiiV*) is relatively large or D2{y*) 
is relatively large. (This structural result is made precise in Lemma [T5|). We thus take additional 
samples, both from Di and from D2, and compare the weight (according to both distributions) 
of each point in these samples to the representatives in R (using the procedure Compare from 
Subsection l3.ip . In this manner we detect (with high probability) that either (1) or (2) holds. 

We begin by formalizing the notion of a cover discussed above: 

Definition 5 (Weight-Cover) Given a distribution D on [N] and a parameter ei > 0, we say 
that a point i E [A^] is ei-covered hy a set R = {ri, . . . ,rt} ^ [N] if there exists a point rj £ R 
such that D{i) G [1/(1 + ei), 1 + ei\D{rj). Let the set of points in [N] that are ei-covered hy R he 
denoted hy C^^{R). We say that R is an (ei, e2)-cover for D if D{[N] \ C^^{R)) < €2- 

The following lemma says that a small sample of points drawn from D gives a cover with high 
probability: 

Lemma 14 Let D he any distribution over [N]. Given any fixed c > 0, there exists a constant 
c' > such that with prohahility at least 99/100, a sample R of size m = d ^"^^^y*^^ . log ^iHSi^^M^ 
drawn according to distribution D is an {e/c,e/c) -cover for D. 

Proof: Let t denote [ln(2cA^/e) • We define t "buckets" of points with similar weight under D 
as follows: for i = 0, 1, . . . , t — 1, define Bi C [N] to be 

Bi L G [N] : , \ < D{x) < 



(1 + e/c)*+i ' ' - (1 + e/cY 
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Let L be the set of points x which are not in any of Bq, . . . , Bt-i (because D{x) is too small); since 
every point in L has D{x) < -qZn^ ^'^^ i^&t D[L) < ^. 

It is easy to see that if the sample R contains a point from a bucket Bj then every point y £ Bj 
is |-covered by R. We say that bucket Bi is insignificant if D{Bi) < otherwise bucket Bi is 
significant. It is clear that the total weight under D of all insignificant buckets is at most e/2c. Thus 
if we can show that for the claimed sample size, with probability at least 99/100 every significant 
bucket has at least one of its points in R, we will have established the lemma. 

This is a simple probabilistic calculation: fix any significant bucket Bj. The probability that 
m random draws from D all miss Bj is at most (1 — ^)"^, which is at most for a suitable 
(absolute constant) choice of c' . Thus a union bound over all (at most t) significant buckets gives 
that with probability at least 99/100, no significant bucket is missed by R. I 



Lemma 15 Suppose d'YY{Di, D2) > e, and let R = {ri, . . . ,rt} be an {e^e)- cover for Di where 
€ < e/100. Then, there exists j G [t] such that at least one of the following conditions holds for 
every a G [e, 2e] ; 

1. Di{US^{r,)) > I and D2{US'{r,)) i [1 - e, 1 + e]L»i(?7i'i(^j))> or Di{US^{r,)) < f and 

2. Di{U^^{rj)) > J, and at least a e-fraction of the points i in U^^{rj) satisfy 
^^[l/(l + a + e),l + a + 6]; 

3. Di{Uj^^{rj)) > J, and the total weight according to D2 of the points i in U^^irj) for which 

i [1/(1 + a + e), 1 + a + e] is at least f ; 



Proof: Without loss of generality, we can assume that e < 1/4. Suppose, contrary to the claim, 
that for each rj there exists aj € [e, 2e] such that if we let Uj *== U^_^^{rj), then the following holds: 

1. If Di{Uj) < I, then D2{Uj) < f ; 

2. EDi{Uj) > f, then: 

(a) D2{U,)€[l-e,l + l]DiiU,); 

(b) Less than an e-fraction of the points y in Uj satisfy jj^j^^^-^ ^ [1/(1 + "^j + ^)) 1 + o^i + 

(c) The total weight according to D2 of the points y in Uj for which 

^ [1/(1 + aj + e), 1 + aj + e] is at most f ; 

We show that in such a case d^yiDi, D2) < e, contrary to the premise of the claim. 

Consider each point rj G R such that Di{Uj) > |. By the foregoing discussion (point 2(a)), 
D2{Uj) € [1 — e, 1 + e]Di{Uj). By the definition of Uj (and since Oj < 2e), 

Di(r,) G [1/(1 + 26), 1 + 26] . (50) 
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Turning to bound D2{rj), on one hand (by 2(b)) 

D2iUj) = D2{y) > e\U,\ . + (1 - e)|f/,| • , (51) 



and so 

i l + 3l)D2iU,) ^ .MUj) 
{l-~e)\U,\ \U,\ 

On the other hand (by 2(c)), 



D2{r,) < Zi)7 < (1 + 6e)^^^ . (52) 



D2{Uj) = D2{y) <j + m-{l + Te)D2{r,) , (53) 

and so 

(l + 3e)|f/,| - (l + 3e)|^,| 1^,1 • ^''^ 

Therefore, for each such rj we have 

D2{rj) G [1 - 8e, 1 + m]Di{rj) . (55) 

Let C '= Uj=i Uj. We next partition the points in C so that each point i E C is assigned to some 
rj(j) such that i £ Uj(^iy We define the following "bad" subsets of points in [N]: 

1. Bi =^ [N] \ C, so that Di{Bi) < e (we later bound -02(^1)); 

2. B2 =^ {i£C : Di{Uj^)) < e/t}, so that -01(^2) < e and 1)2(52) < 2e; 

3. S3 =^ {iGC\S2:D2(i)^ [l/(l + 3e),l + 3e]L'2(rj(i))}, so that Di{Bs) < 21 and 
D2(i?3) < e'- 

Let B '= Bi U B2 U Bs. Observe that for each i G [N] \ B we have that 

D2ii) G [1/(1 + 3e), 1 + 3e]Z)2(rj(i)) C [1 - 15e, 1 + 15e]Z?i(r,-(i)) C [1 - 23e, 1 + 23e]Di{i) , (56) 

where the first containment follows from the fact that i ^ B, the second follows from Equation (j55p . 
and the third from the fact that i € C^j(-j)- In order to complete the proof we need a bound on 
D2{Bi), which we obtain next. 

D2{B^) = l-D2m\Bi) < 1-D2{[N]\B) < 1 - {1 - 23e)Di{[N]\ B) 

< 1- (l-23e)(l-4e) < 27e . (57) 

Therefore, 

dTYiD,,D2) = -Y\Di{i) - D2{i)\ 

i=l 



2 

and we have reached a contradiction. 



< ^(Di{B) + D2iB) +J2'^3€Di{i)^ < e, (58) 
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Algorithm 8: Algorithm PC0ND£)i,D2-Test-Equality-Unknown 
Input : PCOND query access to distributions Di and D2 and a parameter e. 

1. Set e = e/100. 

2. Draw a sample R of size t = Qi ^°^i^ ^ from Di. 

3. For each rj £ R: 



(a) Cah Estimate-Neighborhooddi on rj with K = e, 7/ = |,/3 = ^,(5 = jjjgj and let 
the output be denoted by {'Wj^\aj). 

(b) Set = Kr]l36 /64 = e(e7log2 A). 

(c) Draw a sample Si from Di, of size si = 0(^) = ^ '°^/^ ^ . 

(d) Draw a sample S2 from Ds, of size S2 = e(^^) = e(^' 

(e) For each point ie S1US2 call Compared^ ({rj}, {i}, 6'/4, 4, l/(200t(si + S2))) and 

COMPARE£»2({?^j}, {«}, 6*74,4, l/(200t(si +S2))), and let the outputs be denoted pi^\i) 
(2) 

and prj (i), respectively (where in particular these outputs may be High or Low). 

^ (2) 

(f) Let Wj be the fraction of occurrences of i G S'2 such that 
p9j^{i) G [1/(1 + aj + 9/2), l + aj + 6/2]. 

(g) If ( wf^ < If and wf > If ) or ( wf^ > f f and /wf ^ [1 - e/2, 1 + e/2] ), then 
output REJECT. 

(h) If there exists i e S1US2 such that pi]\i) G [l/{aj + e/2), 1 + aj + e/2] and 
pif{i) (/ [l/{aj + 3e/2), 1 + aj + 3e/2], then output REJECT. 

4. Output ACCEPT. 



Theorem 10 If Di = D2 then with probability at least 2/3 Algorithm PCOND-Test-Equality- 
Unknown returns ACCEPT, and if dT^Y{Di, D2) > e, then with probability at least 2/3 Algorithm 
PCOND-Test-Equality-Unknown returns REJECT. The number 0/ PCOND queries performed 
by the algorithm is O ( ^"fn^ ) ■ 

Proof: The number of queries performed by the algorithm is the sum of: (1) t times the number of 
queries performed in each execution of Estimate-Neighborhood (in Line l3-ap and (2) i-(si+S2) = 
0{t ■ S2) times the number of queries performed in each execution of Compare (in Line l3-e| ). By 
Lemma [3] (and the settings of the parameters in the calls to Estimate-Neighborhood), the 
first term is o(t • '°g(V^)-l°ga°g(i//)/(/3r?^)) ^ ^ o(^), and by Lemma [2] (and the settings of the 

parameters in the calls to Compare), the second term is O (t • S2 • = O , so that 

we get the bound stated in the theorem. 

We now turn to establishing the correctness of the algorithm. We shall use the shorthand Uj 
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for U^^^{rj), and f/j for U^^j^g{rj). We consider the following "desirable" events. 

1. The event Ei is that the sample -R is a (e, e)-weight-cover for Di (for e = e/100). By LemmafT^ 
(and an appropriate constant in the 0(-) notation for the size of R), the probability that Ei 
holds is at least 99/100. 

2. The event E2 is that all calls to the procedure Estimate-Neighborhood are as specified 
by Lemma [3l By the setting of the confidence parameter in the calls to the procedure, the 
event E2 holds with probability at least 99/100. 

3. The event E-^ is that all calls to the procedure Compare are as specified by Lemma [2j By 
the setting of the confidence parameter in the calls to the procedure, the event £"3 holds with 
probability at least 99/100. 

4. The event E^ is that D2{U'j \ Uj) < rjP/lQ = e^/{256t) for each j. If D2 = Di then this event 
follows from £"2. Otherwise, it holds with probability at least 99/100 by the setting of 9 and 
the choice of aj (as shown in the proof of Lemma [3] in the analysis of the event Ei there ). 

5. The event £5 is defined as follows. For each j, if D2{Uj) > e/(4t), then {82 D Uj\/\S2\ € 
[l-i/W,l + i/W]D2{Uj), andif L>2(C/i) < e/{At) then \S2nUj\/\S2\ < (1 + e/10)e/(4i). This 
event holds with probability at least 99/100 by applying a multiplicative Chernoff bound in 
the first case, and Corollary [2] in the second. 

6. The event Eq is that for each j we have |52 n (C/j \ [/j)|/|52| < e^/(128t). Conditioned on £4, 
the event Eq holds with probability at least 99/100 by applying Corollary [2l 

From this point on we assume that events Ei — Eq all hold. Note that in particular this implies 
the following: 

1. By E2, for every j: 

• If Di{Uj) >f3 = e/(2t), then G [1 - r/, 1 + v]DiiUj) = [I - e/8, 1 + €/8]Di{Uj). 

• If Di{Uj) < e/{2t), then wf^ < (1 + e/8)(e/(2i)). 

2. By E3, for every j and for each point i £ Si U 82- 

• Ifie Uj, then pi^^{i) G [1/(1 + aj + f ), 1 + aj + f]. 

• If i ^ U'j, then pi]\i) i [1/(1 + + f ), 1 + + f]. 

3. By the previous item and E^-Eq: 

• If D2{Uj) > e/(4t), then > (1 - e/W)D2{Uj) and < (1 + e/W)D2{Uj) + 
eV(128t) < {l + i/8)D2{Uj). 

• If D2{Uj) < e/(4i) then wf < (1 + e/10)e7(4t) + 1^(1281) < (1 + e/4)(e7(4i)). 
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Completeness. Assume Di and D2 are the same distribution D. For each j, if D{Uj) > e/t, 
then by the foregoing discussion, wf^ > (1 - e/8)D{Uj) > 3e/(4t) and /wf^ G [(1 - e/Sf, (1 + 
e/8)^] C [1 — e/2,1 + e/2], so that the algorithm does not reject in Line |3-g[ Otherwise (i.e., 
D{Uj) < e/t), we consider two subcases. Either D{Uj) < e/{2t), in which case < 3e/(4t), 

or e/(2t) < D{Uj) < e/t, and then wf^ G [1 - e/8, 1 + e/8]Di{Uj). Since in both cases < 
(1 + e/8)D{Uj) < 3e/{2t), the algorithm does not reject in Line |3-g[ By £^3, the algorithm does not 
reject in Line 13- hi either. We next turn to establish soundness. 

Soundness. Assume dT\{D\,D2) > e. By applying Lemma [TSl on R (and using Ei), there exists 
an index j for which one of the items in the lemma holds. We denote this index by j* , and consider 
the three items in the lemma. 

1. If Item [1] holds, then we consider its two cases: 

(a) In the first case, Di{Uj*) > e/t and D2{Uj_^) ^ [1 — e, 1 + e]Di{Uj*). Due to the lower 
bound on Di{Uj*) we have that Wjl) G [1 — e/8, 1 + e/8]Di{Uj*), so that in particular 

> 3e/(4t). As for wP , either wP < (1 - e)(l + e/8)Di{Uj*) (this holds both when 

D2{Uj*) > e/(4t) and when D2{Uj*) < e/(4t)) or wP > (1 + e)(l - e/W)Di{Uj*). In 

either (sub)case wP /wP (/ [I - e/2, 1 + e/2], causing the algorithm to reject in (the 
second part of ) Line |3-g[ 

(b) In the second case, Di{Uj*) < e/t and D2{Uj*) > 21/ 1. Due to the lower bound on 
D2iUj*) we have that wP > (1 - e/W)D2{Uj*) > (1 - e/10)(2e/t), so that in particular 
wP > (3e/(2t)). As for wP , if Di{Uj*) < e/{2t), then wP < 3e/(4t), causing the 
algorithm to reject in (the first part of) Line |3-g[ If e/{2t) < Di{Uj*) < e/t, then wP € 

[1 - e78, 1 + ~e/8]DiiU,*) < (1 + e78)(e7t), so that wP /wP > ^^g/j^^gy^ > (1 + e72), 
causing the algorithm to reject in (either the first or second part of) Line |3-g| 

2. If Item[2]holds, then by the choice of the size of 5*1, which is ©(7e^)) with probability at least 
99/100, the sample 5i will contain a point i for which q^^iI''^-^ ^ [1/(1 + oij* + e); 1 + ckj* + A-> 
and by E-^ this will be detected in Line l3-h1 

3. Similarly, if Item[3]holds, then by the choice of the size of ^2, with probability at least 99/100, 
the sample 52 will contain a point i for which q^^^^!^^ ^ [1/(1 + aj* + e), 1 + aj* + e], and by 
E^ this will be detected in Line l3-hl 

The theorem is thus established. H 

6.2 An approach based on simulating EVAL 

In this subsection we present an alternate approach for testing whether two unknown distributions 
Di,D2 are identical versus e-far. We prove the following theorem: 
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Theorem 11 COND-Test-Equality-Unknown is a 

^l^ (iogjv)5-aog(i/^))^ ^ 

-query algorithm with the following properties: given COND/jj, C0ND£)2 oracles for any two distri- 
butions Di,D2 over [N], it outputs ACCEPT with probability at least 2/3 if Di = D2 and outputs 
REJECT with probability at least 2/3 if d'xy^Di, D2) > e. 

At the heart of this result is an efficient simulation of an "approximate EVAL^ oracle" using a 
CONDz) oracle. (Recall that an EVALd oracle is an oracle which, given as input an element i G [A^], 
outputs the numerical value D{i).) We feel that this efficient simulation of an approximate EVAL 
oracle using a COND oracle is of independent interest since it sheds light on the relative power of 
the COND and EVAL models. 

In more detail, the starting point of our approach to prove Theorem [11] is a simple algorithm 
from |RS09j that uses an EVAL/) oracle to test equality between D and a known distribution D* . 
We first show (see Theorem 1 12p that a modified version of the algorithm, which uses a SAMP oracle 
and an "approximate" EVAL oracle, can be used to efficiently test equality between two unknown 
distributions Di and D2. We then show (see Theorem 1130 how the required "approximate" EVAL 
oracle can be efficiently implemented using a COND oracle. Theorem 1111 follows straightforwardly 
by combining Theorems [12] and [T3l 

6.2.1 Approximate EVAL oracles. 

We begin by defining the notion of an "approximate EVAL oracle" that we will use. Intuitively this 
is an oracle which gives a multiplicatively (1 it e)-accurate estimate of the value of D{i) for all i 
in a fixed set of probability weight at least 1 — e under D. More precisely, we have the following 
definition: 

Definition 6 Let D be a distribution over [N]. An (e, 5)-approximate EVAL^) simulator is a ran- 
domized procedure ORACLE with the following property: For each < e < 1, there is a fixed set 
Si^'D) ^ ^-^^ D{S^^'^^) < e for which the following holds. Given as input an element i* G [A^], 
the procedure ORACLE either outputs a value a G [0, 1] or outputs UNKNOWN. The following holds 
for all i* G [N]: 

(i) If i* ^ 5^*^'^^ then with probability at least 1 — 5 the output 0/ ORACLE on input i* is a value 
a G [0, 1] such that a G [1 - e, 1 + e]D{i*); 

(i) If i* G S^"^'^^ then with probability at least 1 — 6 the procedure either outputs UNKNOWN or 
outputs a value a G [0, 1] such that a G [1 — e, 1 + e]D{i*). 

We note that according to the above definition, it may be the case that different calls to ORACLE on 
the same input element i* G [A^] may return different values. However, the "low-weig ht" set 5(^'^) 
is an a priori fixed set that does not depend in any way on the input point i* given to the algorithm. 
The key property of an (e, (^)-approximate EVAL d oracle is that it reliably gives a multiplicatively 
(1 lb e)-accurate estimate of the value of D(i) for all i in some fixed set of probability weight at least 
1 — e under D. 
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6.2.2 Testing equality between Di and D2 using an approximate EVAL oracle. 

We now show how an approximate EVAL^i oracle, an approximate EVALD2 oracle, and a SAMP/jj 
oracle can be used together to test whether Di = D2 versus dTy{Di, D2) > e. As mentioned earlier, 
the approach is a simple extension of the EVAL algorithm given in Observation 24 of |RS09] . 

Theorem 12 Let ORACLEi be an {e/ WO, e/ 100) -approximate EVAL^i simulator and let OR/KCLE.2 
be an (e / 100, e/ 100) -approximate EVAL^ij simulator. There is an algorithm Test-Equality- 
Unknown with the following properties: for any distributions D\,D2 over [N], algorithm Test- 
Equality-Unknown makes 0{l/e) queries to ORACLEi, ORACLE2 and SAMP/)^, and it outputs 
ACCEPT with probability at least 7/10 if Di = D2 and outputs REJECT with probability at least 
7/10 ifdTy{DuD2)>e. 



Algorithm 9: Test-Equality-Unknown 
Input: query access to ORACLEi, to ORACLE2, and access to SAMP/)^ oracle 
1: Call the SAMP/)^ oracle m = 5/e times to obtain points hi, ... , hm distributed according to 
Di. 

2: Call the SAMP/jj oracle m = 5/e times to obtain points hm+i, • • • , h2m distributed according 

to D2. 
3: for J = 1 to 2m do 

4: Call ORACLEi(/ij). If it returns UNKNOWN then output REJECT, otherwise let vi^i G [0, 1] 
be the value it outputs. 

5: Call 0RACLE2(/ij). If it returns UNKNOWN then output REJECT, otherwise let V2,i G [0, 1] 

be the value it outputs. 
6: if vij ^ [1 - e/8, 1 + e/8]?;2,j then 
7: output REJECT and exit 
8: end if 
9: end for 
10: output ACCEPT 



It is clear that Test-Equality-Unknown makes 0(l/e) queries as claimed. To prove Theo- 
rem [12] we argue completeness and soundness below. 

Completeness: Suppose that Di = D2. Since ORACLEi is an (e/100, e/100)-approximate EVALd^ 
simulator, the probability that any of the 2in = 10/e points hi, ... , /i2m drawn in Lines 1 and 2 
lies in is at most 1/10. Going forth, let us assume that all points hi indeed lie outside 

g[e/ioo,Di) ^ Then for each execution of Line 4 we have that with probability at least 1 — e/100 
the call to ORACLE(/ij) yields a value vi^i satisfying vi^i G [1 — ygg, 1 + -^^]Di{i). The same holds 
for each execution of Line 5. Since there are 20/e total executions of Lines 4 and 5, with overall 
probability at least 7/10 we have that each 1 < j < m has vij,V2j G [1 — igg; ^ + TSo]-^i(^)' ^^^^ 
is the case then vi,j,V2.j pass the check in Line[6l and thus the algorithm outputs ACCEPT with 
overall probability at least 7/10. 
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Soundness: Now suppose that dTY{Di, D2) > e. Let us say that i G [N] is 1700^ if Di{i) S 
[1 - e/5, 1 + e/5]D2{i). Let BAD C [N] denote the set of all i G [N] that are not good. We have 

2dTYiD^,D2)= - D2{i)\ + \Di{i)-D2{i)\>2e. 

i is good i is bad 

Since 

i is good i is good 

we have 

Y (l^i»l + l^2(i)|)> 2; \D^{i)-D2(i)\>\e. 

i is bad j is bad 

Consequently it must be the case that either Z)i(BAD) > or L»2(BAD) > ^e. For the rest of 
the argument we suppose that L'i(BAD) > (by the symmetry of the algorithm, an identical 
argument to the one we give below but with the roles of Di and D2 flipped throughout handles the 
other case). 

Since L)i(BAD) > ^e, a simple calculation shows that with probability at least 98/100 at least 
one of the 5/e points /ii, . . . , hm drawn in Line 1 belongs to BAD. For the rest of the argument we 
suppose that indeed (at least) one of these points is in BAD; let hi* be such a point. Now consider 
the execution of Line 4 when ORACLEi is called on hi* . By Definition [6l whether or not i* belongs to 
g{e/wo,Di)^ with probability at least 1 - e/100 the caU to ORACLEi either causes Test-Equality- 
Unknown to REJECT in Line 4 (because ORACLEi returns UNKNOWN) or it returns a value 
S TOo' ''^~'~TSo]'^i(^*)' We may suppose that it returns a value fi^j* G [1 — ^,1 + ^]Di{i*). 
Similarly, in the execution of Line 5 when ORACLE2 is called on hi* , whether or not i* belongs to 
g{e/wo,D2)^ with probability at least 1 - e/100 the caU to ORACLE2 either causes Test-Equality- 
Unknown to reject in Line 5 or it returns a value V2^i* G [1 — 1 + ■^]D2{i*). We may suppose 
that it returns a value V2,i* G [1 — xgo'l + TSo]-^2(**)- But recalling that i* G BAD, an easy 
calculation shows that the values vi^i* and V2,i* must be multiplicatively far enough from each 
other that the algorithm will output REJECT in Line 7. Thus with overall probability at least 
96/100 the algorithm outputs REJECT. ■ 

6.2.3 Constructing an approximate EVAL/j simulator using COND^) 

In this subsection we show that a COND/) oracle can be used to obtain an approximate EVAL 
simulator: 



Theorem 13 Let D be any distribution over [N] and let < e, 5 < 1. The algorithm Approx- 
EvAL-SiMULATOR has the following properties: It uses 



O 



(logAr)5.(log(l/^))2 

e3 



calls to COND/) and it is an (e, 6) -approximate EVAL/) simulator. 
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A few notes: First, in the proof we give below of Theorem [13] we assume throughout that 
< e < c, where c is a smah absolute constant. This incurs no loss of generality because if the 
desired e parameter is in (c, 1) then the parameter can simply be set to c/2. We further note that in 
keeping with our requirement on a COND/j algorithm, the algorithm ApPROX-EvAL-SlMULATOR 
only ever calls the COND/p oracle on sets S which are either S = [N] or else contain at least one 
element i that has been returned as the output of an earlier call to COND^). (To see this, note that 
Line[6]is the only line when COND^j queries are performed. In the first execution of the outer "For" 
loop clearly all COND queries are on set Sq = [N]. In subsequent stages the only way a set Sj is 
formed is if either (i) Sj is set to {i*} in Line llOl in which case clearly i* was previously received as 
the response of a COND£)(S'j_i) query, or else (ii) a, nonzero fraction of elements z^, . . . , im 

received 

as responses to COND£i(S'j_i) queries belong to Sj (see Line [T9|l .) 

A preliminary simplification. Fix a distribution D over [N]. Let Z denote supp(-D), i.e. 
Z = {i & [N] : D[i) > 0}. We first claim that in proving Theorem 1131 we may assume without loss 
of generality that no two distinct elements i,j G Z have D{i) = D[j) - in other words, we shall 
prove the theorem under this assumption on D, and we claim that this implies the general result. 
To see this, observe that if Z contains elements i i with = D{j), then for any arbitrarily 
small ^ > and any arbitrarily large M we can perturb the weights of elements in Z to obtain a 
distribution D' supported on Z such that (i) no two elements of Z have the same probability under 
D' , and (ii) for every S C [A^], SCiZ 7^ we have dTy{Ds, D'^) < S,/M. Since the variation distance 
between D'^ and Ds is at most £,/M for an arbitrarily small ^, the variation distance between (the 
execution of any M-query COND algorithm run on D) and (the execution of any M-query COND 
algorithm run on D') will be at most ^. Since ^ can be made arbitrarily small this means that 
indeed without loss of generality we may work with D' in what follows. 

Thus, we henceforth assume that the distribution D has no two elements in supp(D) with the 
same weight. For such a distribution we can explicitly describe the set S^^'^^ from Definition [6] that 
our analysis will deal with. Let vr : {1, . . . , \Z\} — >• Z be the bijection such that D(7r(l)) > • • • > 
D{Tr{\Z\)) (note that the bijection vr is uniquely defined by the assumption that D{i) 7^ D{j) for all 
distinct i,j £ Z). Given a value < r < 1 we define the set L^-^d to be ([A^] \ Z)u{7r(s), . . . , 7r(|Z|)} 
where s is the smallest index in {1, . . . , |Z|} such that X^jJs D{Tr{j)) < r (if D{Tr{\Z\)) itself is at 
least T then we define Lt-^d = [N] \ Z). Thus intuitively Lt,d contains the r fraction (w.r.t. D) of 
[A"] consisting of the lightest elements. The desired set S^^'^^ is precisely L^^d. 

Intuition for the algorithm. The high-level idea of the EVAL/j simulation is the following: Let 
i* £ [N] be the input element given to the EVAL^) simulator. The algorithm works in a sequence 
of stages. Before performing the j-th stage it maintains a set Sj-i that contains i* , and it has 
a high-accuracy estimate D{Sj-i) of the value of D{Sj-i). (The initial set Sq is simply [A^] and 
the initial estimate D{Sq) is of course 1.) In the j-th stage the algorithm attempts to construct a 
subset Sj of Sj-i in such a way that (i) i* E Sj, and (ii) it is possible to obtain a high-accuracy 
estimate of D[Sj)/D{Sj-i) (and thus a high-accuracy estimate of D{Sj)). If the algorithm cannot 
construct such a set Sj then it outputs UNKNOWN; otherwise, after at most (essentially) 0(log A^) 
stages, it reaches a situation where Sj = {i*} and so the high-accuracy estimate of D{Sj) = D{i*) 
is the desired value. 

A natural first idea towards implementing this high-level plan is simply to split Sj-i randomly 
into two pieces and use one of them as Sj. However this simple approach may not work; for example. 
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if Sj-i has one or more elements which are very heavy compared to i* , then with a random spht 
it may not be possible to efficiently estimate D{Sj)/D{Sj-i) as required in (ii) above. Thus we 
follow a more careful approach which first identifies and removes "heavy" elements from Sj-^i in 
each stage. 

In more detail, during the j-th stage, the algorithm first performs COND/) queries on the 
set Sj-\ to identify a set Hj C Sj-i of "heavy" elements; this set essentially consists of all ele- 
ments which individually each contribute at least a k fraction of the total mass D{Sj^i). (Here 
K is a "not-too-small" quantity but it is significantly less than e.) Next, the algorithm performs 
additional COND/j queries to estimate D{i*)/D{Sj^i). If this fraction exceeds k/20 then it is 
straightforward to estimate D{i*) / D{Sj-i) to high accuracy, so using D{Sj-i) it is possible to 
obtain a high-quality estimate of D{i*) and the algorithm can conclude. However, the typical 
case is that D{i*)/D{Sj-i) < k/20. In this case, the algorithm next estimates D{Hj)/ D{Sj-i). If 
this is larger than 1 — e/10 then the algorithm outputs UNKNOWN (see below for more discus- 
sion of this). If D{Hj)/D{Sj-i) is less than 1 - e/10 then D{Sj-i \ Hj)/D{Sj^i) > e/10 (and 
so D{Sj-i \ Hj)/D{Sj-i) can be efficiently estimated to high accuracy), but each element k of 
Sj-i \ Hj has D{k)/D{Sj^i) < k < e/10 < D{Sj-i \ Hj)/D{Sj-i). Thus it must be the case that 
the weight under D of Sj-i \ Hj is "spread out" over many "light" elements. 

Given that this is the situation, the algorithm next chooses S'j to be a random subset of Sj-i \ 
{Hj U {«*}), and sets Sj to be S'j U {i*}. It can be shown that with high probability (over the 
random choice of Sj) it will be the case that D{Sj) > ^D{Sj-i \ Hj) (this relies crucially on the 
fact that the weight under D of Sj-i \Hj is "spread out" over many "light" elements). This makes 
it possible to efficiently estimate D{Sj)/D{Sj-i \ Hj); together with the high-accuracy estimate 
of D{Sj^i \ Hj)/D[Sj-i) noted above, and the high-accuracy estimate D{Sj-i) of D{Sj^i), this 
means it is possible to efficiently estimate D{Sj) to high accuracy as required for the next stage. 
(We note that after defining Sj but before proceeding to the next stage, the algorithm actually 
checks to be sure that Sj contains at least one point that was returned from the COND£)(S'j_i) 
calls made in the past stage. This check ensures that whenever the algorithm calls C0ND/3(S') on 
a set S, it is guaranteed that D{S) > as required by our COND/j model. Our analysis shows that 
doing this check does not affect correctness of the algorithm since with high probability the check 
always passes.) 

Intuition for the analysis. We require some definitions to give the intuition for the analysis 
establishing correctness. Fix a nonempty subset S C [N]. Let tts be the bijection mapping 
{1, . . . , IS*!} to S in such a way that L's'(7r5'(l)) > • • • > Ds{7^s{\S\)), i.e. 7r5(l), . . . , 7r5(|5|) is a 
listing of the elements of S in order from heaviest under Ds to lightest under Ds- Given j € S, we 
define the S-rank of j, denoted rank5'(j), to be the value X]j.j:)g(^(j))<j:)g(j) -D5'(7r(i)), i.e. rank5'(j) 
is the sum of the weights (under Ds) of all the elements in S that are no heavier than j under Ds- 
Note that having i* ^ L^^jy implies that rank[^](i*) > e. 

We first sketch the argument for correctness. (It is easy to show that the algorithm only outputs 
FAIL with very small probability so we ignore this possibility below.) Suppose first that i* ^ L^^^. 
A key lemma shows that if i* ^ -L^^^j (and hence rank[7v](^*) > e), then with high probability every 
set Sj-i constructed by the algorithm is such that rank^^. -^(i*) > e/2. (In other words, if i* is not 
initially among the e-fraction (under D) of lightest elements, then it never "falls too far" to become 
part of the e/2-fraction (under Ds^^^) of lightest elements for Sj-i, for any j.). Given that (whp) 
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i* always has rank^^. -^ (z*) > e/2, though, then it must be the case that (whp) the procedure does 
not output UNKNOWN (and hence it must whp output a numerical value). This is because there 
are only two places where the procedure can output UNKNOWN, in Lines [T^ and \T9\ we consider 
both cases below. 

1. In order for the procedure to output UNKNOWN in Line 1141 it must be the case that the 
elements of Hj - each of which individually has weight at least k/2 under Ds _i - collectively 
have weight at least 1 — 3e/20 under Dsj_i by Line[13l But i* has weight at most 3At/40 under 
Dsj-i (because the procedure did not go to Line [2] in Line [TO]l . and thus i* would need to be 
in the bottom 3e/20 of the lightest elements, i.e. it would need to have rank5^_-^ (i*) < 3e/20; 
but this contradicts i:anksj_^{i*) > e/2. 

2. Li order for the procedure to output UNKNOWN in Line 1191 it must be the case that all 
elements ii, . . . ,im. drawn in Line[6]are not chosen for inclusion in Sj. In order for the algorithm 
to reach Line[T9l though, it must be the case that at least (e/10 — K/20)m of these draws do 
not belong to Hj U {i*}; since these draws do not belong to Hj each one occurs only a small 
number of times among the m draws, so there must be many distinct values, and hence the 
probability that none of these distinct values is chosen for inclusion in S'j is very low. 

Thus we have seen that if i* ^ L^^jj, then whp the procedure outputs a numerical value; it 
remains to show that whp this value is a high-accuracy estimate of D{i*). However, this follows 
easily from the fact that we inductively maintain a high-quality estimate of D{Sj^i) and the fact 
that the algorithm ultimately constructs its estimate of D{i*) only when it additionally has a high- 
quality estimate of D{i*) / D[Sj-\). This fact also handles the case in which i* G L^^d — in such a 
case it is allowable for the algorithm to output UNKNOWN, so since the algorithm w.h.p. outputs 
a high-accuracy estimate when it outputs a numerical value, this means the algorithm performs as 
required in Case (ii) of Definition [6l 

We now sketch the argument for query complexity. We will show that the heavy elements can be 
identified in each stage using poly(log A^, 1/e) queries. Since the algorithm constructs Sj by taking 
a random subset of Sj-i (together with i*) at each stage, the number of stages is easily bounded 
by (essentially) O(logA^). Since the final probability estimate for D{i*) is a product of O(logA^) 
conditional probabilities, it suffices to estimate each of these conditional probabilities to within a 
multiplicative factor of (1 it O (kjfiv))- show that each conditional probability estimate can be 
carried out to this required precision using only poly(log A^, 1/e) calls to COND/j; given this, the 
overall poly (log A^, 1/e) query bound follows straightforwardly. 

Now we enter into the actual proof. We begin our analysis with a simple but useful lemma 
about the "heavy" elements identified in Line [71 

Lemma 16 With probability at least 1 — 5/K, every set Hj that is ever constructed in Line 
satisfies the following for all i G Sj-i: 

(i) If D{£)/D{Sj-i) > K, then i G Hj,- 

(ii) If D{e)/D{Sj^i) < k/2 then Hj. 
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Algorithm 10: AppROX-EvAL-SiMULATOR 
Input: access to COND/j; parameters < e,5 < 1; input element i* G [N] 
1: Set So = [N] and D{So) = 1. Set K = 9. Set M = log N + log{K/6) + 1. Set 

K = e{e/{M^ log{M/6))). 
2: for j = 1 to M do 
3: if ISj-il = 1 then 
4: return D{Sj^i) (and exit) 
5: end if 

6: Perform m = e{max{M'^log{M/6)/{€^K,),log{M/{6K))/K^}) CONDd queries on Sj^i to 

obtain points ii, . . . ,im G 'S'j-i. 
7: Let Hj = {/c G [A^] : k appears at least ^kui times in the list ii, . . . , im} 
8: Let Dsj_i (i*) denote the fraction of times that i* appears in ii, . . . ,1^ 
9: if Ds^_,{i*) > ^ then 

10: Set Sj = {i*}, set D{Sj) = Dsj_-^{i*) ■ D{Sj-i), increment j, and go to LineO 
11: end if 

12: Let Dsj-i (Hj) denote the fraction of elements among ii, . . . ,im that belong to Hj. 
13: if Dsj^, (Hj) > 1 - e/10 then 
14: return UNKNOWN (and exit) 
15: end if 

16: Set Sj to be a uniform random subset of Sj^i \ {Hj U {i*}) and set Sj to be Sj U {i*}- 

17: Let Dsj_^ (Sj) denote the fraction of elements among ii, . . . ,im that belong to Sj 

18: if Dsj_,{Sj) = then 

19: return UNKNOWN (and exit) 

20: end if 

21: Set D{Sj) = Ds.^ASj) ■ D{Sj^i) 

22: end for 

23: Output FAIL. 
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Proof: Fix an iteration j. By Line [7| in the algorithm, a point £ is included in Hj if it appears at 
least jKui times among ii, . . . ,im (which are the output of COND/j queries on Sj-i). For the first 
item, fix an element i such that D{t)/D{Sj^i) > k. Recah that m = n{M^ log{M / 5) / {e^ k)) = 
^}{log{M N / S) / k) (since M = r2(log(A^))). By a multiplicative Chernoff bound, the probability 
(over the choice of ii, . . . , im in Sj-i) that £ appears less than jKm times among ii, . . . ,im (that is, 
less than 3/4 times the lower bound on the expected value) is at most 6/ {KMN) (for an appropriate 
constant in the setting of m). On the other hand, for each fixed £ such that D{£) / D{Sj-i) < k/2, 
the probability that £ appears at least jKui times (that is, at least 3/2 times the upper bound on 
the expected value) is at most 6 /{KMN) as well. The lemma follows by taking a union bound 
over all (at most N) points considered above and over all M settings of j, I 

Next we show that with high probability Algorithm Approx-Eval-Simulator returns either 
UNKNOWN or a numerical value (as opposed to outputting FAIL in Line 1230 : 

Lemma 17 For any D, e,6 and i* , Algorithm Approx-Eval-Simulator outputs FAIL with prob- 
ability at most 5/K. 

Proof: Fix any element i ^ i* . The probability (taken only over the choice of the random subset 
in each execution of Line [16]) that i is placed in S'^ in each of the first log + \og{K/6) executions 
of Line [16] is at most -^y. Taking a union bound over all A — 1 points i ^ i* , the probability that 
any point other than i* remains in S'j-i through all of the first log A + \og[K/6) executions of the 
outer "for" loop is at most Assuming that this holds, then in the execution of the outer "for" 
loop when j = log A + \og{K/6) + 1, the algorithm will return D{Sj^i) = D{i*) in LineH] H 

For the rest of the analysis it will be helpful for us to define several "desirable" events and show 
that they all hold with high probability: 

1. Let El denote the event that every set Hj that is ever constructed in Line [7] satisfies both 
properties (i) and (ii) stated in Lemma [T6l By Lemma [16] the event Ei holds with probability 
at least 1 — 6/K. 

2. Let E2 denote the event that in every execution of Line [9] the estimate Dsj-ii^*) is within 
an additive it^ of the true value of D{i*)/D{Sj^i). By the choice of m in Line[6] (i.e., using 
m = ^l{log{M/6)/ K^)), an additive Chernoff bound, and a union bound over all iterations, 
the event E2 holds with probability at least 1 — 5/K. 

3. Let £"3 denote the event that if Line [10] is executed, the resulting value Ds _j^{i*) lies in 

~ 237' Assuming that event E2 holds, if Line [10] is reached then 

the true value of D{i*)/ D{Sj^i) must be at least k/40, and consequently a multiplicative 
Chernoff bound and the choice of m (i.e. using m = Q{M'^ log{M / S) / {e^ k))) together imply 
that Dsj^i{i*) hes in [1 — 1 -|- i^]D{i*) / D{Sj^i) except with failure probability at most 
5/K. 

4. Let i?4 denote the event that in every execution of Linell2| the estimate Dsj_^{Hj) is within 
an additive error of it^ from the true value of D{Hj)/D{Sj^i). By the choice of m in Line [6] 
(i.e., using m = il(log(M/(5)/e^)) and an additive Chernoff bound, the event E4 holds with 
probability at least 1 — 5/K. 
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The above arguments show that Ei,E2, and £'4 all hold with probability at least 1 — 4:6/ K. 



Let E^ denote the event that in every execution of Line [TBI the set Sj which is drawn satisfies 
D{S'-) / D[Sj^i \ {Hj U {i*})) > 1/3. The following lemma says that conditioned on Ei through E^ 
all holding, event E's holds with high probability: 



Lemma 18 Conditioned on Ei through E^ the probability that E^ holds is at least 1 — 6/K. 



Proof: Fix a value of j and consider the j-th iteration of Line [TBI Since events E2 and £'4 hold, 
it must be the case that Z)(5j_i \ {Hj U {i*}))/ D{Sj^i) > e/40. Since event Ei holds, it must be 
the case that every i £ {Sj-i \ {Hj U {i*})) has D{i) / D{Sj-i) < n. Now since S'^ is chosen by 
independently including each element of Sj-i \ [Hj U {i*}) with probability 1/2, we can apply the 
first part of Corollary [3] and get 



Pr 



D{S'^)<\D{S,^,\{H,U{i*})) 



< g-4e/(40-9-4«) ^ 



KM ' 



where the last inequality follows by the setting of k = 0(e/(M^ log(l/(5))). I 
Thus we have established that Ei through E^ all hold with probability at least 1 



b5/K. 



Next, let Eq denote the event that the algorithm never returns UNKNOWN and exits in Linell9[ 
Our next lemma shows that conditioned on events Ei through i?5, the probability of Eq is at least 
1-5/K: 



Lemma 19 Conditioned on Ei through E^ the probability that Eq holds is at least 1 — 6/K. 



Proof: Fix any iteration j of the outer "For" loop. In order for the algorithm to reach Line [18] 
in this iteration, it must be the case (by Lines [9] and llSp that at least (e/10 — K/20)m > (e/20)m 
points in ii,. . . ,im do not belong to Hj U {i*}. Since each point not in Hj appears at most jKm 
times in the list ii, . . . ,im, there must be at least distinct such values. Hence the probability 
that none of these values is selected to belong to S'j is at most l/2'^/(^^'^) < 6/ (KM). A union bound 
over all (at most M) values of j gives that the probability the algorithm ever returns UNKNOWN 
and exits in Line [19] is at most S/M, so the lemma is proved. I 

Now let Ey denote the event that in every execution of Line I17[ the estimate Ds^^^iSj) lies 
in [1 — 2^^, 1 + 237]-^('^i)/-^('^i-i)- '^^^ following lemma says that conditioned on Ei through £'5, 
event Ey holds with probability at least 1 — 6/K: 



Lemma 20 Conditioned on Ei through E^, the probability that Ej holds is at least 1 — 6/K. 



Proof: Fix a value of j and consider the j-th iteration of Line[T6l The expected value of Ds._^ [Sj 
is precisely 

D{Sj) _ D{Sj) D{Sj^^\{HjU{i*})) 



D{Sj^,) D{Sj^i\{HjU{i*})) D{S_ 



(59) 
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Since events E2 and E4 hold we have that ^ > e/40, and since event £'5 holds we 

have that ]^(H-u{^*})) — ^/"^ (note that D{Sj) > D{Sj)). Thus we have that is at least 
e/120. Recalling the value of m (i.e., using m = n{M^ log{M / 6) / k) = Q.iM'^ \og{K M / 5) / e")) a 
multiplicative Chernoff bound gives that indeed Dsj_^ {Sj) G [1 — ^ + with 
failure probability at most 6 /{KM). A union bound over all M possible values of j finishes the 
proof. ■ 

At this point we have established that events Ei through £'7 all hold with probability at least 
1 - 75 /K. 

We can now argue that each estimate D{Sj) is indeed a high-accuracy estimate of the true value 
D{Sj): 

Lemma 21 With probability at least 1 — 75 /K each estimate D{Sj) constructed by Approx-Eval- 
SiMULATOR lies in [(1 - , (1 + ^y]D{Sj). 

Proof: We prove the lemma by showing that if all events Ei through Ej hold, then the following 
claim (denoted (*)) holds: each estimate D{Sj) constructed by Approx-Eval-Simulator lies in 
[(^ ~ 2I7)''' + 2My'\^('^j)- Thus for the rest of the proof we assume that indeed all events Ei 
through Ej hold. 

The claim (*) is clearly true for j = 0. We prove (*) by induction on j assuming it holds for 
j — 1. The only places in the algorithm where D{Sj) may be set are Lines [TOl and [211 If D{Sj) is 
set in Line [21] then (*) follows from the inductive claim for j — I and Lemma [20l If D{Sj) is set in 
Line llO[ then (*) follows from the inductive claim for j — 1 and the fact that event £3 holds. This 
concludes the proof of the lemma. H 

Finally, we require the following crucial lemma which establishes that if i* ^ -C'e,Ar (and hence 
the initial rank rankjjy] of i* is at least e), then with very high probability the rank of i* never 
becomes too low during the execution of the algorithm: 

Lemma 22 Suppose i* ^ L^^n- Then with probability at least 1 — 5/K, every set Sj-i constructed 
by the algorithm has rank^^. ^ (i*) > e/2. 

We prove Lemma [22] in Section [6.2.41 below. 

With these pieces in place we are ready to prove Theorem 1 13[ 

Proof of Theorem [T3t It is straightforward to verify that algorithm Approx-Eval-Simulator 
has the claimed query complexity. We now argue that Approx-Eval-Simulator meets the two 
requirements (i) and (ii) of Definition [6] Throughout the discussion below we assume that all the 
"favorable events" in the above analysis (i.e. events Ei through Ej, Lemma \T7\ and Lemma [22]) 
indeed hold as desired (incurring an overall failure probability of at most 5). 

Suppose first that i* ^ £e,D- By Lemma [22] it must be the case that the algorithm does not 
return UNKNOWN in Line[T31 (This is because in order to reach Line [U] it would need to be the 
case that D{i*)/D{Sj^i) < 3k/40 (so the algorithm does not instead go to Line[22]in Line llOh . but 
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since by Lemma [16] every element k in Hj has D{k) / D{Sj-i) > k/2, this means that i* does not 
belong to Hj. In order to reach Line[T31 by event E4 we must have D{Hj)/D{Sj-i) > 1 — 3e/20. 
Since every element of Hj has more mass under D (at least k/2) than i* (which has at most 3fi;/40), 
this implies that rank5^_-^ (i*)<3e/20. This contradicts Lemma [22l) And by Lemma [19] it must be 
the case that the algorithm does not return UNKNOWN in Line [19] Thus the algorithm terminates 
by returning an estimate D{Sj) = D{i*) which, by Lemma [2T| lies in [(1 — 237)"') (1 + ^y]^{''*)- 
Since j < M this estimate lies in [1 — e, 1 + e]D{i*) as required. 

Now suppose that i* G -Le,D- By Lemma [17] we may assume that the algorithm either outputs 
UNKNOWN or a numerical value. As above, Lemma [2T] implies that if the algorithm outputs a 
numerical value then the value lies in [1 — e, 1 + e]D(i*) as desired. This concludes the proof of 
Theorem [T3l I 



6.2.4 Proof of Lemma [221 



The key to proving Lemma [22] will be proving the next lemma. (In the following, for S a set of real 
numbers we write sum(S') to denote X^Qg^a-) 



Lemma 23 Fix < e < c. Set k = Q{e/{M'^ log(l/(5))). Let T = {ai, . . . , a„} be a set of values 
ai < • ■ • < On such that sum(T) = 1. Fix I € [n] and let = {ai,...,a^} and let Tji = 
{q^+i, . . . , an}, so TlUTr = T. Assume that sum(Ti) > e/2 and that ai < k/10. 

Fix H to be any subset of T satisfying the following two properties: (i) H includes every aj 
such that aj > k; and (ii) H includes no aj such that aj < k/2. (Note that consequently H does 
not intersect T^.) 

Let T' be a subset of (T \ {H U {at}) selected uniformly at random. Let T'^ = T' Ci Tl and let 
T'j^ = T'r\ Tr. 

Then we have the following: 



1. If snvii{Ti) > 20e, then with probability at least 1 — 5/M (over the random choice of T' ) it 
holds that 

sum(r[ U {ai}) ^ 
sum(r'U{a4) ~ ' 

2. If e/2 < sum(TL) < 20e, then with probability at least 1 — 5/M (over the random choice of 
T' ) it holds that 

sumfTr U \ae\) . , 

)J: ] > sum Tl 1 - p , 

sum(r' U {ai]) 



where p = ^ 



Proof of Lemma 1221 using Lemma I23t We apply Lemma [23] repeatedly at each iteration j 
of the outer "For" loop. The set H of Lemma [23] corresponds to the set Hj of "heavy" elements 
that are removed at a given iteration, the set of values T corresponds to the values D{i) / D{Sj^i) 
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for i G Sj-i, and the element ai of Lemma [23] corresponds to D{i*)/ D{Sj-i). The value sum(Ti) 
corresponds to rank5^._j (i*) and the value 

sum(T^ U {ai}) 
sum(T' U {ai}) 

corresponds to rank5'^(i*). Observe that since i* ^ L^ n we know that initially rank[jv](^*) > e, which 
means that the first time we apply Lemma [23] (with T = {D{i) : i G [N]}) we have sum(Ti) > e. 

By Lemma [23] the probability of failure in any of the (at most M) iterations is at most S/K, so 
we assume that there is never a failure. Consequently for all j we have that if rank5^._-^ (i*) > 20e 
then Tanksj{i*) > 9e, and if e/2 < rank5^_-^ (i*) < 20e then rank5^.(i*) > rank^^. (i*) • (1 — p) . Since 
rank5Q(i*) > e, it follows that for all j < M we have rankg^. (i*) > e • (1 — p)^^ > e/2. H 

Proof of Lemma 1231 We begin with the following claim: 

Claim 24 With probability at least 1 — 5/{2M) (over the random choice of T' ) it holds that 
sum(T[) > i • sum(TL) • (1 - p/2). 

Proof: Recall from the setup that every element ai G Ti satisfies ai < k/10, and sxiva.{Ti) > e/2. 
Also recall that k = Q{e/{M'^ log(l/(5))) and that p=^^,so that p^e/(6K) > hi{2M/5). The claim 
follows by applying the first part of Corollary [3] (with 7 = p/2). I 

Part (1) of Lemma [23] is an immediate consequence of Claim [24l since in part (1) we have 

sum(r^U{a4) /T^/N ^ 1 \ ft P\ on fi ^ n 

> sum(ri) > - • sum(rL) • 1 - - > - • 20e • 1 - - > 9e. 



sum(r'U{Q4) - ' - 2 ^ V 2/ - 2 V 2. 

It remains to prove Part (2) of the lemma. We will do this using the following claim: 

Claim 25 Suppose e/2 < sum(TL) < 20e. Then with probability at least 1 — 5/{2M) (over the 
random choice of T' ) it holds that sum(r^) < ^sum(rR) • (1 + p/2). 

Proof: Observe first that that Oj < k for each a^ G Tr \ H. We consider two cases. 

If sum(rR \ H) > 4e, then we apply the first part of Corollary [3] to the Oj's in Tr\H and get 
that 



Pr 



1 



sum(r;j) > -sum(TR) • (1 + p/2) < Pr sum(r;j) > -sum(rR \ H)) ■ (1 + p/2) 



1 



2 



< exp(-p2sum(TR\F)/24K) (60) 



< exp(-p2e/(6«)) < (61) 



(recah from the proof of Claim[M]that p^e/(6K) > \n{2M/5)). 

If sum(rR \ H) < 4e, (so that the expected value of sum(T^) is less than 2e) then we can apply 
the second part of Corollary [3] as we explain next. Observe that by the premise of the lemma, 
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sum(rR) > 1 — 20e which is at least 1/2 (recahing that e is at most a small absolute constant c). 
Consequently, the event "sum(T^) > | •sum(rR) • (l + p/2)" implies the event "sum(T^) > and 
by applying the second part of Corollary [3] we get 



Pr 

as required. 



sum(r;j) > hnm{TR) ■ (1 + p/2) 



< Pr 



sum(r;j) > i 



< 2-V4« < ^ ^ (62) 



Now we can prove Lemma [23j Using Claims [23] and [25] we have that with probability at least 
1 - 6/M, 

sum(rl) > ^ • sum(rL) • (1 - p/2) and sum(T;j) < ^sum(rR) • (1 + p/2); 

we assume that both these inequalities hold going forth. Since 

sum(T^ U {ai}) sum(T^) + ^ sum(T^) 



sum(T' U {ae}) sum(T') + ae sum(r') 

sum(T^) 
sum(T') 



it is sufficient to show that pSr ^ sum(TL)(l — p); we now show this. As 



sum(r') = sum(T[^) + sum(r^), 

sum(r^) _ sum(r[) _ 1 



sum(r') sum(r[) + sum(r' ) ^ , """^'C^a) 

sum(TJJ 



> 



1 



, (l/2)-sum(ri;)-(l+p/2) 
^ + (l/2)-sum(Ti)-(l-p/2) 



sum(rL) • (1 - p/2) 



> 



sum(TL) • (1 - p/2) + sum(TR) • (1 + p/2) 

sum(rL) • (1 - p/2) 

sum(TL) • (1 + p/2) + sum(TK) • (1 + p/2) 

sum(TL) • ] > sum(TL) • (1 - p). 

1 + p/2 



This concludes the proof of Lemma 



7 An algorithm for estimating the distance to uniformity 

In this section we describe an algorithm that estimates the distance between a distribution D and 
the uniform distribution U by performing poly(l/e) PCOND (and SAMP) queries. We start by 
giving a high level description of the algorithm. 

By the definition of the variation distance (and the uniform distribution), 

i:D{i)<l/N ^ ' 
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We define the following function over [N]: 



^^(i) = {l-N ■ D{i)) for D{i) < — , and V (i) = for D{i) > — . (64) 
Observe that Tp^{i) G [0, 1] for every i G [N] and 

1 ^ 

dTv{D,U) = ^J^V'^'W- (65) 
1=1 

Thus dTv{D,U) can be viewed as an average value of a function whose range is in [0, 1]. Since D 
is fixed throughout this subsection, we shall use the shorthand ip{i) instead of Suppose we 

were able to compute 1^(1) exactly for any i of our choice. Then we could obtain an estimate d of 
dTy{D,ly() to within an additive error of e/2 by simply selecting G(l/e^) points in [N] uniformly 
at random and setting d to be the average value of ■0(") on the sampled points. By an additive 
Chernoff bound (for an appropriate constant in the &{■) notation), with high constant probability 
the estimate d would deviate by at most e/2 from d-Y\{D^U). 

Suppose next that instead of being able to compute Vl*) exactly, we were able to compute 
an estimate such that |V'(^) — V'(^)l ^ By using ip[i) instead of for each of the 

0(l/e^) sampled points we would incur an additional additive error of at most e/2. Observe first 
that for i such that D{i) < e/{2N) we have that > 1 — e/2, so the estimate = 1 meets 
our requirements. Similarly, for i such that D{i) > 1/N, any estimate G [0, e/2] can be 

used. Finally, for i such that D{i) G [e/(2A^), 1/A^], if we can obtain an estimate D(i) such that 
D{i) G [1 - e/2, 1 + e/2]D{{), then we can use = N ■ D{i). 

In order to obtain such estimates "0(0) we shall be interested in finding a reference point x. 
Namely, we shall be interested in finding a pair {x,D(x)) such that D{x) G [1 — e/c, 1 + e/c]D(x) 
for some sufficiently large constant c, and such that D{x) = Q(e/N) and D{x) = 0(1/ {eN)). In 
Subsection 17.11 we describe a procedure for finding such a reference point. More precisely, the 
procedure is required to find such a reference point (with high constant probability) only under 
a certain condition on D. It is not hard to verify (and we show this subsequently), that if this 
condition is not met, then dT\{D,U.) is very close to 1. In order to state the lemma we introduce 
the following notation. For 7 G [0, 1], let 



Lemma 26 Given an input parameter k G (0, 1/4] as well as SAMP and PCOND query access to 
a distribution D, the procedure Find-Reference (Algorithm \12\) either returns a pair {x,D{x)) 
where x G [N] and D{x) G [0, 1] or returns No-Pair. The procedure satisfies the following: 

1. If D{HJ^) <l — k, then with probability at least 9/10, the procedure returns a pair {x, D{x)) 
such that D{x) G [1 - 2k, 1 + 3k]Z)(x) and D{x) G [f , |] • ^. 

2. If D{H^) > 1 — K, then with probability at least 9/10, the procedure either returns No-Pair or 
it returns a pair (x, D{x)) such that D{x) G [1 — 2k, 1 -|- 'iK\D[x) and D{x) G [f , ^] • 
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The procedure performs 0{1/k'^^) PCOND and SAMP queries. 

Once we have a reference point x we can use it to obtain an estimate for any i of our choice, 
using the procedure Compare, whose properties are stated in Lemma [2] (see Subsection 13. ip . 

Algorithm 11: Estimating the Distance to Uniformity 
Input : PCOND and SAMP query access to a distribution D and a parameter e S [0, 1]. 

1. CaU the procedure Find- Reference (Algorithm [T2]) with k set to e/8. If it returns No-Pair, 
then output d = 1 as the estimate for the distance to uniformity. Otherwise, let (x, D{x)) be 
its output. 

2. Select a sample S of 0(l/e^) points uniformly. 

3. Leti^ = max{|J|,^}. 

4. For each point y G S*: 

(a) Call COMPARE(^{x},{y},e/2,i^, 

(b) If Compare returns High or it returns a value p{y) such that p{y) ■ D{x) > l/N, then 
set tpiy) = 0; 

(c) Else, if Compare returns Low or it returns a value p(y) such that p{y) ■ D{x) < e/4A, 
then set tp{y) = 1; 

(d) Else set ^{y) = N ■ p{y) ■ D{x). 

5. Output d = ^ Y^yas ^(y)- 



Theorem 14 With probability at least 2/3, the estimate d returned by Alaorithm\ll\ satisfies: d = 
d^Y{D,l/() lb 0{k). The number of queries performed by the algorithm is 0{l/e'^^). 

Proof: In what follows we shall use the shorthand instead of H^. Let Eq denote the event 
that the procedure Find- Reference (Algorithm I12p obeys the requirements in Lemma [26l where 
by Lemma[26]the event Eq holds with probability at least 9/10. Conditioned on Eq, the algorithm 
outputs d = 1 right after calling the procedure (because the procedure returns No-Pair) only when 
D{H^) > 1 - K = 1 - e/8. We claim that in this case dTY{D,U) > 1 - 2e/8 = 1 - e/4. To verify 
this, observe that 

dMD,U)= Yl (D{^)-^)>Y.(D{^)-^^=D{H,)-\^>D{H,)-^. (67) 

i:D{i)>l/N ^ ^ i&H^ ^ ^ 

Thus, in this case the estimate d is as required. 

We turn to the case in which D{Hf^) < 1 — k and the procedure Find-Reference returns a 
pair (x, D{x)) such that D{x) G [1 - 2k, 1 -b 2,k\D{x) and D{x) e [f , |] • ^• 
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We start by defining two more "desirable" events, which hold (simultaneously) with high con- 
stant probability, and then show that conditioned on these events holding (as well as Eq), the 
output of the algorithm is as required. Let Ei be the event that the sample S satisfies 



yes 



< e/2 



(68) 



By an additive Chernoff bound, the event Ei holds with probability at least 9/10. 



Next, let E2 be the event that all calls to the procedure Compare return answers as specified 
in Lemma [21 Since Compare is called |S| times, and for each call the probability that it does not 
return an answer as specified in the lemma is at most 1/(10|5|), by the union bound the probability 
that E2 holds is at least 9/10. 

From this point on assume events Eq, Ei and E2 all occur, which holds with probability at least 
1 — 3/10 > 2/3. Since E2 holds, we get the following. 

1. When Compare returns High for y £ S (so that "0(2/) is set to 0) we have that 



Diy)>K.D{x)>il^.Dix)>^ 
D{x) 

implying that "0(2/) = ipiu)- 

When Compare returns Low for y £ S (so that ip{y) is set to 1) we have that 

D(x) 



(69) 



Diy) < ^ < 



< 



K D{x)/ie/m) 2iV 
implying that ^(y) < ip{y) + e/2 (and clearly tpdi) < ip{y))- 



(70) 



When Compare returns a value p{y) it holds that p{y) G [1 — k, 1 + K\{D{y)/ D{x)), so that 
p{y) ■ D{x) e [(1 - Kf, (1 + Kf]D{y). Since k = e/8, if p{y) ■ D{x) > l/N (so that i{y) is set 
to 0), then ^'(y) < e/2, if p{y) ■ D{x) < e/AN (so that ^{y) is set to 1), then ip{y) > 1 - e/2, 
and otherwise \'4^{y) — i^{y)\ < e/2. 



It follows that 



as required. 



^5]^(y)-6/2,^5]V(y) + e/2 



1^1 



y&S 



ydS 



C[dTY{D,U)-€,dTy{D,U)+e] (71) 



The number of queries performed by the algorithm is the number of queries performed by 
the procedure Find-Reference, which is 0{l/e'^^), plus G(l/e^) times the number of queries 
performed in each call to Compare. The procedure Compare is called with the parameter K, 
which is bounded by 0(l/e^), the parameter r], which is ri(e), and 6, which is r2(l/e^). By LemmalU 
the number of queries performed in each call to Compare is 0(log(l/e)/e^). The total number of 
queries performed is hence 0{l/e'^^). H 



59 



7.1 Finding a reference point 



In this subsection we prove Lemma [26l We start by giving the high-level idea behind the procedure. 
For a point x G [A^] and 7 E [0,1], let U^{x) be as defined in Equation ([TT]) . Since D is fixed 
throughout this subsection, we shall use the shorthand U^{x) instead of Ul^{x). Recall that k 
is a parameter given to the procedure. Assume we had a point x for which D{Ui^{x)) > k'^^ 
and \U^,{x)\ > k'^^N for some constants di and ^2 (so that necessarily D{x) = Q.{k'^^ /N) and 
D{x) = 0{1/{k'^^ N)). It is not hard to verify (and we show this in detail subsequently), that if 
D{H) < 1 — K, then a sample of size 0(l/poly(K)) distributed according to D will contain such a 
point X with high constant probability. Now suppose that we could obtain an estimate w oi D{Uk{x)) 
such that w G [1 — K,l + k]D{Uk{x)) and an estimate u of |?7k(2;)| such that u £ [1 — k, 1 + k]|C/k(3;)|. 
By the definition of Uk{x) we have that (w/u) S [1 — 0(k), 1 + 0{k,)]D{x). 

Obtaining good estimates of D{Uk{x)) and [[/^(x)! (for x such that both |?7fi;(3;)| and D{U^{x)) 
are sufficiently large) might be infeasible. This is due to the possible existence of many points y for 
which D{y) is very close to [\ + k)D{x) or D{x)/{\ + k) which define the boundaries of the set Un{x). 
For such points it is not possible to efficiently distinguish between those among them that belong 
to Uk{x) (so that they are within the borders of the set) and those that do not belong to U^{x) (so 
that they are just outside the borders of the set). However, for our purposes it suffices to estimate 
the weight and size of some set Ua{x) such that a > k (so that U^{x) C Ua{x)) and a is not much 
larger than k (e.g., a < 2k)). To this end we can apply Procedure Estimate-Neighborhood (see 
Subsection 13. 2p . which (conditioned on D{Uk{x)) being above a certain threshold), returns a pair 
(w^x), a) such that w^x) is a good estimate of D{Ua{x)). Furthermore, a is such that for a' slightly 
larger than a, the weight of Ua'{x) \ Ua{x) is small, allowing us to obtain also a good estimate jl{x) 
of \U^{x)\/N. 

Proof of Lemma I26t We first introduce the following notation. 

LS{,,o(i)<^}.MS{i:^<«,i)<^}. (72) 

Let H = where iJf is as defined in Equation (I66p . Observe that D{L) < n/2, so that if 
D{H) < 1 — K, then D[M) > k/2. Consider further partitioning the set M of "medium weight" 
points into buckets Mi, . . . , Mj. where r = \ogi_^^{2/ k^) = ©(log(l/K)/K) and the bucket Mj is 
defined as follows: 

M, {i : (1 + ny-^ . ^ < D{i) < (1 + kY . ^} . (73) 
We consider the following "desirable" events. 

1. Let El be the event that conditioned on the existence of a bucket Mj such that D{Mj) > 
K/2r = Q{k'^ / log{l/ k)), there exists a point x* X that belongs to Mj. By the setting of 
the size of the sample X, the (conditional) event Ei holds with probability at least 1 — 1/40. 

2. Let E2 be the event that ah calls to Estimate-Neighborhood return an output as specified 
by Lemma [3l By Lemma [3l the setting of the confidence parameter 5 in each call and a union 
bound over all \X\ calls, E2 holds with probability at least 1 — 1/40. 
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Algorithm 12: Procedure Find-Reference 



Input : PCOND and SAMP query access to a distribution D and a parameter k € (0, 1/4] 

1. Select a sample X of G(log(l/K)/K^) points distributed according to D. 

2. For each x ^ X do the following: 

(a) Call Estimate-Neighborhood with the parameters k as in the input to 
Find-Reference, /? = / {m\og{l / k)) , = k, and 5 = 1/(40|X|). Let 

e = Kripd/U = G(KVlog^(l/K)) (as in Find- Reference). 

(b) If Estimate-Neighborhood returns a pair {w{x),a{x)) such that 
w{x) < K^/201og(l/K), then go to Line [2] and continue with next x £ X. 

(c) Select a sample Yx of size 0(log^(l/K)/K^) distributed uniformly. 

(d) For each y eYr, call Compare({2;}, {y}, 6'/4, 4, 1/40|X| ly^; |), and let the output be 
denoted Px{y)- 

(e) Let fj,{x) be the fraction of occurrences oi y £ Y^ such that 
Px{y) G [1/(1 + a + e/2), 1 + Q + 9/2]. 

(f) Set D{x) = w{x)/{fi{x)N). 

3. If for some point x £ X we have w^x) > K^/201og(l/K), jl{x) > K^/201og(l/K), and 
k/Au < D{x) < 2/{kN), then return {x,D{x)). Otherwise return No-Pair. 



3. Let £"3 be the event that for each x £ X we have the following. 

jf > then ^^!]^kpM e [1 - ,/2, 1 + ,/2]^%lM 

j£ ^ b3 ^^^^ |y.nt/^(^)(x)| ^ ..3 



Af ^ 401og(l/K)' \Y^\ ^ 301og(l/K)' 

(b) Let A^(^x),e{^) '= Ua{x)+e{x) \ Ua(x){^) (where 6 is as specified by the algorithm). 

Tf \^a(x),e{x)\ ^ k"^ , , |y^nAa(^)_fl(a:)| |Aa(^),9(x)| 

A - 2401og{l/K) ' ^^^"^ |y^| A ' 



A ^ 2401og(l/re)' ^ l^cl 1201og(l/K)' 

By the size of each set Y^ and a union bound over all rr E X, the event £"3 holds with 
probability at least 1 — 1/40. 

4. Let £4 be the event that all calls to Compare return an output as specified by LemmaEJ By 
Lemma [21 the setting of the confidence parameter 5 in each call and a union bound over all 
(at most) \X\ ■ \Y\ calls, £3 holds with probability at least 1 — 1/40. 

Assuming events Ei-E/^ all hold (which occurs with probability at least 9/10) we have the following. 

1. By E2, for each x G A such that w{x) > fi;^/201og(l/K) (so that x may be selected for the 
output of the procedure) we have that D{Ua[x){x)) > K^/401og(l/K). 
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The event E2 also implies that for each x £ X we have that D{A^(^^-j q{x)) < r]P/16 < 
(r//16) • D{Ua(^^){x)), so that 

\^a{x),9i^)\ ^ yjl + a{x)){l + a{x) + 9) \Ua(^){x)\ ^ \Ua{x){x)\ 

N - 16 ' N - 6' N ' ^ ^ 

2. Consider any x £ X such that w{x) > K^/201og(l/K). Let '= {y : Pxiu) £ [1/(1 + a + 
6/2), (1 + a + 6/2], so that jl{x) = \Tx\/\Yx\. By E4, for each y ^Y^n Ua(x){x) we have that 
Px{y) < (1 + «)(1 + 6/4) < (1 + a + 6/2) and > (1 + a)-\l - 6 /A) > (1 + a + 0/2)-i, 
so that y G T^.. On the other hand, for each y ^ Y^C] Ua(^x)+e{x) we have that Px{y) > 
{l + a + 6){l- 6/4) > 1 + a + 0/2 or px[y) < (1 + a + 6)-^{\ - 6/4) < (1 + a + 6/2)-^, so 
that y ^Tx- It follows that 

Yxr^u^^,){x) c c y,n(f/,(,)(x)uA,(,),,(x)) . (75) 

By i?3, when (l{x) = \Tx\/\Yx\ > K'^/201og(l/K), then necessarily fl{x) G [1 — 77, 1 + 
a(x){x)\ / N . To verify this consider the following cases. 

(a) If ^^"'^'^"^^^ > 40 iog(i/K) ' ^^^"^ left-hand-side of Equation ([75]) and the definition of 
Ez) we get that jl{x) > (1 - ^/2) '^"W^""^' , and (by the right-hand-side of Equation dTSj) . 
Equation ([74]) and E3) we get that fi{x) < (1 + ??/2) + 2(r?/6) < (1 + 

(b) If < 4oi3|^, then (by the right-hand-side of Equation (USD, Equation <^ 
and ^3) we get that p{x) < ^qj^^M + 120 ^'(iM) < '^V201og(l/K). 

3. If D{H) < 1 — K, so that D{M) > k/2, then there exists at least one bucket Mj such that 
D{Mj) > K/2r = 0(K^/log(l/K)). By Ei, the sample X contains a point x* £ Mj. By the 
definition of the buckets, for this point x* we have that D{U,,ix*)) > K/2r > kV(10 log(l/«;) 
and \U^{x*)\ > {K,'^/2r)N > ^^/(lO log(l/K). 



By the first two items above and the setting tj 
K^/20log{l/K) and fi{x) > /20log{l/ k). 



K we have that for each x such that w{x) > 



Dix) £ 



1 — K 1 + K, 



l + K 1 



D{x) C [1-2k,1 + ^k]D{x) . 



Thus, if the algorithm outputs a pair [x, D{x)) then it satisfies the condition stated in both items 
of the lemma. This establishes the second item in the lemma. By combining all three items we 
get that if D{H) > 1 — k then the algorithm outputs a pair {x,D{x)) (where possibly, but not 
necessarily, x = x*), and the first item is established as well. 



Turning to the query complexity, the total number of PCOND queries performed in the 
\X\ = 0(log(l/K)/K2) calls to Estimate-Neighborhood is o(^^iHi^^^j^^Z(Mi^ = 0(1/^18), 

and the total number of PCOND queries performed in the calls to Compare (for at most all pairs 
x£X andy£ Yx) is 0(1/^20). ■ 
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8 Algorithms and lower bounds for testing uniformity with 

ICONDn 



In this section we consider ICOND algorithms for testing whether an unknown distribution D over 
[A^] is the uniform distribution versus e-far from uniform. Our results show that ICOND algorithms 
are not as powerful as PCOND algorithms for this basic testing problem; we give a poly(log A^, 1/e)- 
query ICOND/j algorithm, and prove that any ICOND/5 algorithm must make VtilogN) queries. 

8.1 A O ^ ^°^3^ j -query ICONDd algorithm for testing uniformity 

In this subsection we describe an algorithm ICOND/j-Test-Uniform and prove the following 
theorem: 

Theorem 15 ICONDd-Test-Uniform is a ^°^^^^ )- query ICOND^ testing algorithm for uni- 
formity, i.e. it outputs ACCEPT with probability at least 2/3 if D = U and outputs REJECT with 
probability at least 2/3 if d-j^y[D,U) > e. 

Intuition. Recall that, as mentioned in Section 14.11 any distribution D which is e-far from 
uniform must put f2(e) probability mass on "significantly heavy" elements (that is, if we define 
H' = [he [N] I D{h) > ^ + 4^ }, it must hold that D{H') > e/4). Consequently a sample of 
0(l/e) points drawn from D will contain such a point with high probability. Thus, a natural ap- 
proach to testing whether D is uniform is to devise a procedure that, given an input point y, can 
distinguish between the case that y £ H' and the case that D{y) = 1/N (as it is when D = U). 

We give such a procedure, which uses the ICOND^i oracle to perform a sort of binary search over 
intervals. The procedure successively "weighs" narrower and narrower intervals until it converges 
on the single point y. In more detail, we consider the interval tree whose root is the whole domain 
[A^], with two children {!,..., A^/2} and {A^/2 + 1, . . . , A^}, and so on, with a single point at each of 
the A'^ leaves. Our algorithm starts at the root of the tree and goes down the path that corresponds 
to y; at each child node it Compare to compare the weight of the current node to the weight of its 
sibling under D. If at any point the estimate deviates significantly from the value it should have if 
D were uniform (namely the weights should be essentially equal, with slight deviations because of 
even/odd issues), then the algorithm rejects Assuming the algorithm does not reject, it provides a 
(1 ibO(e))-accurate multiplicative estimate of D{y), and the algorithm checks whether this estimate 
is sufficiently close to 1/A^ (rejecting if this is not the case). If no point in a sample of 0(l/e) points 
(drawn according to D) causes rejection, then the algorithm accepts. 

The algorithm we use to perform the "binary search" described above is Algorithm 113^ Binary- 
Descent. We begin by proving correctness for it: 

Lemma 27 Suppose the algorithm Binary-Descent is run with inputs e £ {0,1], a = 1, b = 
N, and y £ [N], and is provided ICOND oracle access to distribution D over [N]. It performs 
0{\o^ N / e^) queries and either outputs a value D{y) or REJECT, where the following holds: 
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Algorithm 13: Binary-Descent 



Input: parameter e > 0; integers 1 < a < 6 < A^; y G [a, query access to ICOND/j oracle 

1: if a = 6 then 

2: return 1 

3: end if 

4: Let c= L^^J; A = (5-a + l)/2. 

5: if y < c then 

6: Define ly = {a, . . . ,c}, ly = {c + 1, . . . ,b} and p = [A]/[AJ 

7: else 

8: Define ly = {a, . . . ,c}, ly = {c + 1, . . . ,b} and p = [AJ/[A] 

9: end if 

10: Call Compare on ly, ly with parameters r] = ^g^fg-^v' K = 2, 5 = ioo(i-)!iog jv) 

estimate p of D{Iy) / D{Iy) 

11: if p ^ [1 — 4g log jv ) 1 + 48iogjv ] " P (this includes the case that p is High or Low) then 

12: return REJECT 

13: end if 

14: Call recursively Binary-Descent on input (e, the endpoints of ly, y); 

15: if Binary-Descent returns a value v then 

16: return -rr^ • v 

17: else 

18: return REJECT 

19: end if 



Algorithm 14: ICONDd-Test-Uniform 
Input: error parameter e > 0; query access to ICOND/j oracle 
1: Draw t = — points yi,. . . ,yt from SAMP/j. 
2: for j = 1 to t do 

3: Call BiNARY-DESCENT(e, l,N,yj) and return REJECT if it rejects, otherwise let dj be the 

value it returns as its estimate of D{yj) 
4: if ^ [1 - ^, 1 + ^] • i then 
5: return REJECT 
6: end if 
7: end for 
8: return ACCEPT 
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1. if D{y) > + then with probability at least 1 — the procedure either outputs a value 
D{y) G [1 - e/12, 1 + e/12]D{y) or REJECT; 

2. if D = U, then with probability at least 1 — the procedure outputs a value D{y) € [1 — 
e/12, 1 + e/12] 

Proof of Lemma \27\ The claimed query bound is easily verified, since the recursion depth is 
at most 1 + log N and the only queries made are during calls to Compare, each of which performs 
0(log(l/5)/72) = 0(log2 A^/e2) queries. 

Let Eq be the event that all calls to Compare satisfy the conditions in Lemma [21 since each of 
them succeeds with probability at least 1 — 5 = 1— ioo(i^iog A^) ' ^ union bound shows that Eq holds 
with probability at least 1 — e/100. We hereafter condition on Eq. 

We first prove the second part of the lemma where D = IA. Fix any specific recursive call, say 
the j'-th, during the execution of the procedure. The intervals ly , used in that execution of the 
algorithm are easily seen to satisfy D{Iy) / D{Iy) G [1/K^ K] (for K = 2), so by event £"0 it must be 

the case that Compare returns an estimate pj G [1 — igiH^) 1 + isT^lv] " ^i^v'^) / ^i^'i'^)- Since 

D = U, we have that L>(4^'V^(4^^) 

= p^^\ so the overall procedure returns a numerical value 

rather than REJECT. 

Let M = [logA^] be the number of recursive calls (i.e., the number of executions of Line I14p . 
Note that we can write D{y) as a product 

Md(4'>) M o(4'')/o('f ) + 1 ■ * ' 

We next observe that for any < e' < 1 and p, d > 0, if p G [1 — e', 1 + e']d then we have 
G [1 — |-, 1 + e']^jri (by straightforward algebra). Applying this M times, we get 



M 

n 



pj 



961ogiV 



961ogiV 

e e 

— ,1 + — 
12' 12 



M 



M 



1 + 



1 + 



48 log N 



48 log N 



M 



M' 



Di4^^)/Dilif^) 
fAD{I?)/Di4^^) + l 

■D{y) 



D{y). 



Since Hjli 



Pi 



Pj- 



j is the value that the procedure outputs, the second part of the lemma is proved. 



The proof of the first part of the lemma is virtually identical. The only difference is that now 
it is possible that Compare outputs High or Low at some call (since D is not uniform it need 
not be the case that D{Iy '^)/D{ljj''^) = p'--'^), but this is not a problem for (i) since in that case 
Binary-Descent would output REJECT. ■ 

See Algorithm [T3l for a description of the testing algorithm ICOND/j-Test-Uniform. We now 
prove Theorem 1151 
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Proof of Theorem [15} Define Ei to be the event that ah cahs to Binary-Descent satisfy the 
conclusions of Lemma [271 With a union bound over ah these t = 20/e calls, we have Pr[£'i] > 8/10. 

Completeness: Suppose D = U, and condition again on Ei. Since this implies that Binary- 
Descent will always return a value, the only case ICOND/j-Test-Uniform might reject is 
by reaching Line [3 However, since it is the case that every value dj returned by the procedure 
satisfies D{y) G [1 - e/12, 1 + e/12] • ^, this can never happen. 

Soundness: Suppose dTY{D,l^) > £• Let E2 be the event that at least one of the y^'s drawn 
in Line [D belongs to H'. As D{H') > e/4, we have Pr[^2] > 1 - (1 - e/4)20/^ > 9/10. 
Conditioning on both Ei and E2, for such a yj, one of two cases below holds: 

• either the call to Binary-Descent outputs REJECT and ICOND/j-Test-Uniform out- 
puts REJECT; 

• or a value dj is returned, for which dj > (1 — ]^)(1 + f ) ■ ;^ > (1 + £/12)/A^ (where we 
used the fact that Ei holds); and ICOND/j-Test-Uniform reaches Line[5land rejects. 

Since Pr[£;i US2] > 7/10) ICONDd-Test-Uniform is correct with probability at least 2/3. Fi- 
nally, the claimed query complexity directly follows from the t = 0(l/e) calls to Binary-Descent, 
each of which makes 0{log^ N/e^) queries to ICOND/j. ■ 

8.2 An r2(log A^/loglog A^) lower bound for ICONDj) algorithms that test unifor- 
mity 

This subsection proves that any ICOND^i algorithm that e-tests uniformity even for constant e must 
have query complexity Q(logN). This shows that our algorithm in the previous subsection is not 
too far from optimal, and sheds light on a key difference between ICOND and PCOND oracles. 

Theorem 16 Fix e = 1/3. Any ICOND/j algorithm for testing whether D = U versus 
diY{D,D*) > e must make queries. 

To prove this lower bound we define a probability distribution Vtsso over possible "No"- 
distributions (i.e. distributions that have variation distance at least 1/3 from U). A distribution 
drawn from "Pnq is constructed as follows: first, we partition [A^] into b = 2^ consecutive intervals 
of the same size A = ^ , which we refer to as "blocks" , where X is a random variable distributed 
uniformly on the set {| log N, ^ log -|- 1, . . . , | log N}. Once the block size A is determined, a 
random offset y is drawn uniformly at random in [N], and all block endpoints are shifted by y 
modulo [N] (intuitively, this prevents the testing algorithm from "knowing" a priori that specific 
points are endpoints of blocks). Finally, independently for each block, a fair coin is thrown to deter- 
mine its profile: with probability 1/2, each point in the first half of the block will have probability 
weight ^-j^ and each point in the second half will have probability ^-j^ (such a block is said to 
be a a low-high block, with profile With probability 1/2 the reverse is true: each point in the 
first half has probability ^-j^ and each point in the second half has probability (a high-low 
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block tJ-)- It is clear that each distribution D in the support of "Pno defined in this way indeed has 
dTY{D,U) = e. 

To summarize, each "No" -distribution D in the support of T^no is parameterized by (6 + 2) 
parameters: its block size A, offset y, and profile 'd G {ititi}^- Note that regardless of the profile 
vector, each block always has weight exactly A/N. 

We note that while there is only one "Yes" -distribution U, it will sometimes be convenient for 
the analysis to think of U as resulting from the same initial process of picking a block size and 
offset, but without the subsequent choice of a profile vector. We sometimes refer to this as the 
"fake construction" of the uniform distribution U (the reason for this will be clear later). 

The proof of Theorem [16] will be carried out in two steps. First we shall restrict the analysis 
to non- adaptive algorithms, and prove the lower bound for such algorithms. This result will then 
be extended to the general setting by introducing (similarly to Section 15. 2p the notion of a query- 
faking algorithm, and reducing the behavior of adaptive algorithms to non-adaptive ones through 
an appropriate sequence of such query- faking algorithms. 

Before proceeding, we define the transcript of the interaction between an algorithm and a 
ICOND/) oracle. Informally, the transcript captures the entire history of interaction between the 
algorithm and the ICOND/j oracle during the whole sequence of queries. 

Definition 7 Fix any (possibly adaptive) testing algorithm A that queries an ICOND/j oracle. The 
transcript of A is a sequence T = {li, Sj)jgN* of pairs, where li is the i-th interval provided by the 
algorithm as input to ICOND/), and Si £ li is the response that ICOND/j provides to this query. 
Given a transcript T, we shall denote by T\k the partial transcript induced by the first k queries, 
i.e. T\k = {Ii,Si)i<i<k- 

Equipped with these definitions, we now turn to proving the theorem in the special case of 
non-adaptive testing algorithms. Observe that there are three different sources of randomness in 
our arguments: (i) the draw of the "No"-instance from 'Pno (ii) the internal randomness of the 
testing algorithm; and (iii) the random draws from the oracle. Whenever there could be confusion 
we shall explicitly state which probability space is under discussion. 



8.2.1 Against non-adaptive algorithms 

Throughout this subsection we assume that A is an arbitrary, fixed, non-adaptive, randomized 
algorithm that makes exactly q < T • ]\i queries to ICOND/^; here t > is some absolute 
constant that will be determined in the course of the analysis. (The assumption that A always 
makes exactly q queries is without loss of generality since if in some execution the algorithm makes 
q' < q queries, it can perform additional "dummy" queries). In this setting algorithm A corresponds 
to a distribution over g-tuples / = (/i, . . . ,Iq) of query intervals. The following theorem will 
directly imply Theorem [16] in the case of non-adaptive algorithms: 

Theorem 17 

FiD^P^M^^°^°° outputs ACCEPT] - Pr[A'^°^^" outputs ACCEPT] < 1/5. (77) 
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Observe that in the first probabihty of Equation (j77p the randomness is taken over the draw of D 
from T^No) the draw of I ~ Pa that A performs to select its sequence of query intervals, and the 
randomness of the ICOND/j oracle. In the second one the randomness is just over the draw of I 
from Pa and the randomness of the ICOND^y oracle. 

Intuition for Theorem I17L The high-level idea is that the algorithm will not be able to distin- 
guish between the uniform distribution and a "No" -distribution unless it manages to learn some- 
thing about the "structure" of the blocks in the "No" -case, either by guessing (roughly) the right 
block size, or by guessing (roughly) the location of a block endpoint and querying a short interval 
containing such an endpoint. 

In more detail, we define the following "bad events" (over the choice of D and the points Sj) for 
a fixed sequence I = (Ii, . . . , /g) of queries (the dependence on / is omitted in the notation for the 
sake of readability): 

Bl, = {3iG[q]\ A/logiV < |/,| < A • (logiV)^ } 
-^boundary = { 3i G [q] \ | /j | < A/ log N and li intersects two blocks } 

-^i^outcr = ■ (log A^)^ < and Sj belongs to a block not contained entirely in Ij} i G [q] 
-^I^coUidc = ■ (log A^)^ < l-^il and 3j < i, Si and sj belong to the same block} i € [q] 

The first two events depend only on the draw of D from Pno which determines A and y, while 
the last 2q events also depend on the random draws of Sj from the ICOND/j oracle. We define in the 
same fashion the corresponding bad events for the "Yes" -instance (i.e. the uniform distribution U) 
-^Jzc ^b<)undary' ^ifouter ^^'^ ^ifcoUide' using the notion of the "fake construction" of U mentioned 
above. 

Events -B^^e and -BJ^c correspond to the possibility, mentioned above, that algorithm A "guesses" 
essentially the right block size, and events B^^^^^^^^ and B^^^^^^^^ correspond to the possibility 
that algorithm A "guesses" a short interval containing a block endpoint. The final bad events 
correspond to A guessing a "too-large" block size but "getting lucky" with the sample returned by 
ICOND, either because the sample belongs to one of the (at most two) outer blocks not entirely 
contained in the query interval, or because A has already received a sample from the same block 
as the current sample. 

We can now describe the failure events for both the uniform distribution and for a "No"- 
distribution as the union of the corresponding bad events: 



Bfl) = Bfi^c U -^boundary ^ (^U ^*>utor ) ^ ( |J -B.^ 

Y Y Y ( Y 

-^(7) = -^Jzc U -^boundary ^ I U ^^fi 




These failure events can be interpreted, from the point of view of the algorithm A, as the 
"opportunity to potentially learn something;" we shall argue below that if the failure events do not 
occur then the algorithm gains no information about whether it is interacting with the uniform 
distribution or with a "No" -distribution. 
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Structure of the proof of Theorem 1171 First, observe that since the transcript is the resuh 
of the interaction of the algorithm and the oracle on a randomly chosen distribution, it is itself a 
random variable; we will be interested in the distribution over this random variable induced by the 
draws from the oracle and the choice of D. More precisely, for a fixed sequence of query sets /, let 
denote the random variable over "No" -transcripts generated when D is drawn from "Pno- Note 
that this is a random variable over the probability space defined by the random draw of D and the 
draws of by ICOND£)(/j). We define 2tj as the resulting distribution over these "No" -transcripts. 
Similarly, Zj will be the random variable over "Yes" -transcripts, with corresponding distribution 

As noted earlier, the nonadaptive algorithm A corresponds to a distribution Pa over g-tuples I 
of query intervals. We define 21^ as the distribution over transcripts corresponding to first drawing 
I from Pa and then making a draw from 21 j. Similarly, we define 21^ as the distribution over 
transcripts corresponding to first drawing / from Pa and then making a draw from 2tj . 

To prove Theorem [T7] it is sufficient to show that the two distributions over transcripts described 
above are statistically close: 



Lemma 28 ^tv (21^^,21^) < V^- 



The proof of this lemma is structured as follows: first, for any fixed sequence of q queries /, we 
bound the probability of the failure events, both for the uniform and the "No" -distributions: 



Claim 29 For each fixed sequence I of q query intervals, we have 

Pr[55J < 1/10 and Vtd^v^^[B''j.] < 1/10. 



(Note that the first probability above is taken only over the randomness of the ICOND^^ responses, 
while the second is over the random draw of Z) ~ Pno and over the ICOND/j responses.) 

Next we show that, provided the failure events do not occur, the distribution over transcripts 
is exactly the same in both cases: 



Claim 30 Fix any sequence I = of q queries. Then, conditioned on their respective 

failure events not happening, and Zj are identically distributed: 



for every transcript T = ((/i, si), . . . , (/g, Sq)), Pr 



Z' 



T 



Pr 



Z 



Y 



T 



Finally we combine these two claims to show that the two overall distributions of transcripts are 
statistically close: 

Claim 31 Fix any sequence of q queries I = (/i, . . . ,Iq)- Then dxv , 2lj ) < 1/5. 
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Lemma [28] (and thus Theorem [T7|) directly follows from Claim [3T] since, using the notation 
s = (si,...,Sq) for a sequence of q answers to a sequence / = of q queries, which 

together define a transcript T(/, s) = ((/i, si ),..., (/g, s^)), 

dTV (21^,21^) = ^^^|PA(/)-Pr[zf = r(7,s-)]-PA(/)-Pr[^f = r(/,.-)]| 

= E E ^^(^") • E [^f = '^(^"' - = m s)] I 

J s s 

< maxjdTV (2tj,2l^)} < 1/5 . (78) 

This concludes the proof of Lemma [251 modulo the proofs of the above claims; we give those proofs 
in Section [8.2.21 below. 



8.2.2 Proof of Claims [29] to [31] 

To prove Claim [29] we bound the probability of each of the bad events separately, starting with the 
"No" -case. 

(i) Defining the event Bf^^^^ as 

= {A/logiV < < A • (logiV)2} , 

we can use a union bound to get Pr[i?^^p] < Ylj=iPA^i^sizc\- ™y fixed setting of 
li there are O(loglogiV) values of A G | X G {i log iV, . . . , | log A^}} for which 

A/logAf < /i < A • (logiV)2. Hence we have Pr[Sf^giJ = 0((loglogiV)/log A^), and conse- 
quently Pr[B^,J = 0{q{loglogN)/logN). 

(ii) Similarly, defining the event -Bj^boundary 

-^iVboundary = < A/logA^ and /j intersects two blocks} , 

we have Pr[Sboundary] ^ ELi P^[-^Ilboundary]- ^^"^ ^^^^ ^^^^ Setting of Ii, recahing the choice 
of a uniform random offset y G [A^] for the blocks, we have that Pi'i-Bf'bQmjjg^j.y] < 0(1/ log A^), 
and consequently Pr[-B^oundary] = C>(g/log A^). 

(iii) Fix i G [q] and recall that Bf^^^^^ = {A • (logA^)^ < and Sj is drawn from a block 
C /j}. Fix any outcome for A such that A • (logA^)^ < \Ii\ and let us consider only the 
randomness over the draw of Si from Ij. Since there are r2((log A^)^) blocks contained entirely 
in Ii, the probability that Si is drawn from a block not contained entirely in /j (there are at 
most two such blocks, one at each end of Ii) is 0(l/(log A^)^). Hence we have Pr[i3]^Q^^gj,] < 
0(l)/(logAr)2. 

(iv) Finally, recall that 

-^i^coUidc = ■ (log A^)^ < \Ii\ and 3j < i s.t. Si and Sj belong to the same block } . 
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Fix i S [q] and a query interval /j. Let rj be the number of blocks in /j within which 
resides some previously sampled point sj, j £ [i — 1]. Since there are Q,{(logN)'^) blocks in 
li and < i — 1, the probability that Sj is drawn from a block containing any sj, j < i, is 
0(z/(logiV)2). Hence we have Pr[SN„iiide] = 0{i/ {log Nf). 



With these probability bounds for bad events in hand, we can prove Claim 



Proof of Claim 

yields 



Recall that q < r ■ 



log Af 
log log TV ' 



Recalling the definition of -B|j^ , a union bound 



Pr[S5)] < Pr[S,^J+Pr[i?b'l,undary] + EP^[<uter]+EP^[^*%^^ 



< 



^/^JoglogiV 
V log iV 

1 

10 ' 



i=l 

q 

loffiV 



i=l 



i=l 
1 



(logA^)^ 



1=1 



{log NY 



where the last inequality holds for a sufficiently small choice of the absolute constant r. 

The same analysis applies unchanged for Pr[Sj2g], and Pr [^boundary]' using the "fake construc- 
tion" view of U as described earlier. The arguments for Pi'[-Bj^o^tpj.] and Prl-Bj^^QUj^j^,] go through 
unchanged as well, and Claim [29] is proved. I 



Proof of Claim [30} Fix any I = and any transcript T = ((/i, si), . . . , (Ig, s^)). 

Recall that the length-^ partial transcript T\i is defined to be {{h, si), . . . , {le, s^)). We define the 
random variables Z^^ and Zj^ to be the length-^ prefixes of Zj^ and Zj respectively. We prove 
Claim [30] by establishing the following, which we prove by induction on i: 



Pr 



T 



(!) 



Pr 



Z 



Y 



T 



(I) 



(79) 



For the base case, it is clear that (f79|) holds with £ = 0. For the inductive step, suppose ([79]) holds 
for all k ^ [i — 1]. When querying at the ^-th step, one of the following cases must hold (since 
we conditioned on the "bad events" not happening): 



(1) In is contained within a half-block (more precisely, either entirely within the first half of a 
block or entirely within the second half). In this case the "yes" and "no" distribution oracles 
behave exactly the same since both generate Sj by sampling uniformly from Jj. 

(2) In contains many blocks and belongs to a block, contained entirely in /j, which is "fresh" in 
the sense that it contains no Sj, j < i. In the "No"-case this block may either be high-low or 
low-high; but since both outcomes have the same probability, there is another transcript with 
equal probability in which the two profiles are switched. Consequently (over the randomness 
in the draw of ~ ^No) probability of picking si in the "No" -distribution case is the 
same as in the uniform distribution case (i.e, uniform on li). 

(3) Ii is contained within one block, but not within one half-block (i.e. Ii intersects both the 
first and second halves of its block). By the same symmetry argument as in (2), the profile 
of this block could have been switched, and hence the distribution of Si is uniform over Ij. 
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This concludes the proof of Claim [30l H 

Proof of Claim I31t Given Claims [29] and [30l Claim [3T] is an immediate consequence of the 
following basic fact: 

Fact 32 Let Di, D2 he two distributions over the same finite set X. Let Ei,E2, be two events 
such that Di[Ei] = ai < a for i = 1,2 and the conditional distributions {Di)^ are identical, i.e. 
dTvUD^)^, (Z)2)ej) = 0. Then d^^jiD^^D^) < a. 

Proof: We first observe that since {D2)^{E2) = and (Di)^ is identical to (02)^, it must 
be the case that {Di)-^{E2) = 0, and likewise {D2)^{Ei) = 0. This implies that Di{E2 \ Ei) = 
D2{Ei \ E2) = 0. Now let us write 

2dTy{Di,D2) = Yl \Di{x)-D2{x)\+ Yl \Di{x)-D2{x)\ + 

x<^X\{EiVjE2) xeEinE2 

Y \D,{x)-D2ix)\+ Y \Diix) - D2ix)\. 

xeEi\E2 xeE2\Ei 

We may upper bound ExeSinSa l-^i(^) ~ ExeSinSa^^i^^) + ^^{x)) = Di{Ei n E2) + 

D2{Ei n E2), and the above discussion gives '^xeEi\E2 l-^i(^) ~ T>2{x)\ = Di{Ei \ E2) and 
T.x(iE2\E, - D2{x)\ = D2{E2 \ Si). We thus have 

2dTVPi,^2) < \Di{x)-D2{x)\+D^{E^) + D2{E2) 

x&X\{EiUE2) 

< Y \Di{x) - D2{x)\ + ai + a2. 

xeX\{EiUE2) 

Finally, since dTv((-Ci)-g^, (L'2)"gj) = 0, we have 

Y \Di{x) - D2{x)\ = \Di{X\{Ei\JE2))-D2{X\{Ei\JE2)\ 

xGX\(EiUE2) 

= \Di(R[) - D2(E^)\ = \ai - a2\. 
Thus Id'YviDi, D2) < \ai — 02] + ai + 02 = 2max{ai,a2} ^ 2a, and the fact is established. H 
This concludes the proof of Claim [3T1 H 



8.2.3 A lower bound against adaptive algorithms: Proof of Theorem 



Throughout this subsection A denotes a general adaptive algorithm that makes q < t ■ 
queries, where as before r > is an absolute constant. Theorem [16] is a consequence of the 
following theorem, which deals with adaptive algorithms: 



Theorem 18 

PrB^P^Jyl'^°^°« outputs ACCEPT] - Pr[A'^°^^" outputs ^CCE?J] < 1/4. (80) 
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The idea here is to use the previous analysis for non- adaptive algorithms, and argue that 
"adaptiveness does not really help" to distinguish between D = U and D ~ given access to 
ICONDd. 

As in Section 15.21 we will need the notion of an algorithm faking queries. Given an adaptive 
algorithm A, we define A^^^ as the algorithm that fakes its first query, in the following sense: If 
the first query made by A to the oracle is some interval I, then the algorithm A^^^ does not call 
ICOND on / but instead chooses a point s uniformly at random from / and then behaves exactly 
as A would behave if the ICOND oracle had returned s in response to the query /. More generally, 
we define A^^^ for all /c > as the algorithm behaving like A but faking its first k queries (note 
that = A). 

Let Sl^'^^'^ denote the distribution over transcripts (of length q) when the distribution D is 
drawn from and the algorithm is A^^^ . Note that by this definition we have = 21^ and 

2((g),N = 2tYra ^ g^g^ outcome A of the block size and y of the offset in the construction of 
o, we write to denote distribution St^^-*'^ conditioned on that particular outcome of 

the block size and offset. 

As in the non-adaptive case, in order to prove Theorem \T8\ it is sufficient to prove that the 
transcripts for uniform and "No" -distributions are close in total variation distance; i.e, that 

dTv(2t^,2t^) < 1/4. (81) 

The key lemma used to prove this is the following lemma, which bounds the variation dis- 
tance between the transcripts of A^^^ (the variant of A that fakes its first k queries) and A^^'^^\ 
conditioned on a high-probability event over the choice of block size and offset. 

Lemma 33 For all k > 0, with probability at least 1 — r]{N) over the choice of (A, y) in the draw 
of D from V^o, we have 

dTv(2lg£,2l[^,t^)=/3(^,iV) (82) 
..here r,{N) = OC^^) and (3{k, N) = 0{j^). 

To see why Equation (([81])) follows from this lemma, first observe that 

fc=0 

Let us say that a pair (A, y) for which Equation (j82p holds is good. We now require the following 
variant of Fact [32] 

Fact 34 Let Di, D2 be two distributions over the same finite set X. Let E he an event such that 
Di[E] = at < a for i = 1,2 and the conditional distributions (Di)^ and {02)^ are statistically 
close, i.e. dTy{{Di)^, (02)-^) = [3. Then dT^y{Di, D2) <a + j3. 

^"We observe that there is no point in defining the corresponding distribution over transcripts for the "yes" case, 
as faking a call is exactly the same as querying the oracle when the underlying distribution is uniform. 
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Proof: As in the proof of Fact [32l let us write 

2dTy{DuD2)= \Di{x)-D2{x)\ + J2\Di{x)-D2{x)\. 

x&X\E xeE 

We may upper bound ^^^e \Di{x) - D2ix)\ by Ex6£;(^i(^) + ^2(2;)) = Di{E) + D2{E) = 01 + 02; 
furthermore, 

\Di{x) - D2{x)\ = Y imsi^) • ^i(^) - P2)i5(x) • D2{E)\ 

xGE xGE 

< D,{E) . Y \{Di)Eix) - iD2)E{x)\ + \D,{E) - D2{E)\ • (D2)^(^) 

x£E 

< (1 - ai) • (2/3) + \a2 - ai| • 1 < 2/3 + \a2 - ai\ 

Thus 2d'ry{Di, D2) < 2/3 + \ai — Q2I + qi + 02 = 2/3 + 2max{ai,a2} < 2(a + P), and the fact is 
estabhshed. ■ 

Applying this fact to distributions 21^'^)''^ and 21^'^"'"^)''^ and the event E = {(A,y) is good}, 
we get that ciTv(2t('')'^,2l('=+i)'N) < r]{N) + P{k,N), and consequently we have dTv(2t^,2l^) < 
Sfc=o ivi^) + which is at most 1/4 for a suitable choice of the absolute constant r. 

Thus it remains only to prove Lemma [33l In the next subsection we describe an alternative 
view of the random draw of D ~ "Pno and in the following subsection we use this alternate view 
to prove the lemma. 

8.2.4 Extended transcripts and drawing D ~ Pno on the fly. 

Observe that the testing algorithm, seeing only pairs of queries and answers, does not have direct 
access to all the underlying information - namely, in the case of a "No" -distribution, whether 
the part of the block the sample comes from is high or low. It will be useful for us to consider 
an "extended" version of the transcripts, which includes this information though it is not directly 
available to the algorithm. 

Definition 8 With the same notation as in Definition the extended transcript of a sequence 
of queries made by A and the corresponding responses is a sequence £ = {li, Si,bi)i^[g] of triples, 
where li and Si are as before, and bi G {|, f, ti} the profile of the block Si belongs to. We define 
£\k to be the length-k prefix of an extended transcript £. 

We also consider extended transcripts of algorithms that fake queries. For any < A; < g we let 
g(fc),N jgnote the distribution over extended transcripts when D ~ "Pno and the algorithm is A^'^\ 
We may think of a draw from in the following way: every time a query /j from the algorithm 

is answered with a point si (either answered from the oracle or faked), the corresponding block 
profile bi of Si is "unveiled" and included as the third component of that triple in the extended 
transcript. (We go into significantly more detail on this below.) Observe that if we discard the 
third component bi of each triple, then corresponds precisely to the distribution Sl'^ of regular 

transcripts and corresponds precisely to the distribution 21^ of regular transcripts. Similar 
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to before, for a fixed outcome A of the block size and y of the offset in the construction of D ~ "Pno; 
we write ^[^^'^ to denote distribution (J^^)'^ conditioned on that particular outcome of the block 
size and offset. 

Our proof of Lemma [33] takes advantage of the fact that one can view the draw of a "No"- 
distribution from Pno as being done "on the fly" during the course of algorithm ^'s (or, more 
generally, ^^'^^'s) execution. More precisely, the size A and the offset y are drawn at the very 
beginning, but we may view the profile vector •& as having its components chosen independently, 
coordinate by coordinate, only as A interacts with ICOND - each time an element Si is obtained 
in response to the i-th query, only then is the corresponding element hi of the profile vector -d 
chosen (if it was not already determined by previous calls to ICOND). We now describe how the 
coordinates of the profile vector 'd are generated sequentially as A interacts with ICOND. 

Consider the ^-th query that A makes to ICOND/). Inductively some coordinates of '& may 
have been already set by previous queries. Let Bi, . . . , Bf^ be the blocks that intersects. 

1. If both of the outermost blocks Bi, Bk have had their bits in set already by previous 
queries, then these settings (^t oi' ti) completely determine the probability under ICOND/) 
that each block Bi, . . . ,B]^ is the block from which si will be chosen^ The algorithm 
draws a block Bi G {Bi, . . . ^B^} according to these probabilities. If the coordinate of -d 
corresponding to Bi has already been determined by a previous query, then Sj is drawn 
from the correct distribution (as determined by the \.'\ or setting) over and bi is set 
accordingly .Otherwise a fair coin is tossed, is set either to |t oi' to ti depending on the 
outcome, si is then drawn from the correct distribution ( as determined by ) and the 
corresponding coordinate of i? is set to hi. The triple (l£,Si,hi) is taken as the i-th element 
of the extended transcript. 

2. If either of the outermost blocks Bi, Bk (or both) have not had their bits in -i? set by previous 
queries, then again the probability under ICOND/) of having si belong to each block Bi, . . . , B^ 
is completely determined (the probability allocated to the "undetermined" outermost blocks 
is exactly proportionate to the amount of the block contained in Ii - since the block is 
still undetermined, the probabilities average out to uniform). The algorithm then proceeds 
as above, using the correct probabilities in this case - it draws a block Bi £ {Bi, . . . ,Bk} 
according to these probabilities and either sets a new coordinate of ■(? if necessary or uses the 
old coordinate, draws S£ from the block Bi, and returns S£,bi as above. As before the triple 
(/^, S£, h() is taken as the ^-th element of the extended transcript. 

This completes our description of how one may view the coordinates of the profile vector -i? as 
being generated sequentially as A interacts with ICOND. 

We next observe that we can also adopt such a view for an algorithm A^^\ which fakes the 
first k queries. Recall that when an algorithm A'^^'^ fakes one of the first k queries (say the ^-th 
query) it draws S£ uniformly at random in I^. In the corresponding extended transcript £\k oi A^^'^ 

^^We could give an exact expression for these probabilities but it would be cumbersome and is not necessary for 
the rest of the argument - all we need is that the probabilities are well defined. Note that any two "internal" blocks 
B2, ■ ■ ■ , Bk-i have equal probability regardless of whether or not their coordinates of 1? have yet been set. 
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the profile bits are set even though A^'^^ has no access to them; we now describe how these bits 
are set. Consider the i-th query {i < k) that A^^'^ fakes. Let Bi, . . . , Bj be the blocks that li 
intersects. The point Si is chosen uniformly at random from Ig, the bit bg is set as follows, and the 
triple {Ii,Si,bi) is taken as the ^-th element of the extended transcript: 

1 . If sg belongs to a block whose profile was already set during the interaction for the i'-th query 
for some £' < i, then the bit bg is set to bg = bgi. 

2. If Ig is completely included within a single half-block, then bg is set uniformly at random to 
one of {it,ti}- 

3. If sg belongs to a block B that is completely included in J^, then we look at the half of the 
block Sg belongs to, and toss a biased coin to set its profile bg E {ititil^ If sg belongs to the 
first half, then the coin toss's probabilities are ((1 — 2e)/2, (1 + 2e)/2); otherwise, they are 
((l + 2e)/2,(l-2e)/2). 

4. if Sg belongs to one of the two end blocks, then: 

(a) if Ig contains at most half of the block that sg belongs to, again bg is set to 4,t or fj, with 
equal probability; 

(b) if Ig contains a fraction ^ -|-x of the block that sg belongs to, then the coin is biased, with 
probability for setting the part of the block in which sg lies to "high" either - — ^ if 

2i ■ l+2s 

this part is the small one (the x portion) , and , 2x i-Se ' if sg is in the complete half-block. 

It follows from the foregoing description that in both cases (an algorithm A that does not fake 
queries, or an algorithm A^^'^ that does), the process described above for choosing the profile vector 
of D "on the fly" indeed corresponds to drawing D from "Pno- Similarly, the resulting distributions 
over extended transcripts correspond to draws from (in the A case) and from g;^^)'^ (in the 

case) respectively. 

8.2.5 Proof of Lemma [33] 

Fix < /c < g and recall the definitions of A'^''\ A^^+^'^ and cJ^^'JJ, '^f^l]'^- (For the sake of 

concision, we shall write 8^^^ and E^'^^^^ for the transcript random variables, and <i.^^\ for 
their distribution.) 

We first observe that prior to the {k + l)st query, the distribution of length-A; transcript prefixes 
is exactly the same under ^^'^^ and so we have (iTv(^'''^'*|fc) ^'''^"''^■'Ife) = 0. For any fixed 

extended transcript E = {Ei, . . . , Eg), we thus have Pk{E) =^ Pr[ £^^^\k = E\jt ] = Pr[ £^^~^^^\it = 
E\k], and 
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Pr[ =E]-Fr[ S^''^^^ = E ] 
.(fc) 

k+l 



Pr[4^\ = Ek+i I = Ek] - Pr[4^t'^ = ^k+i \ S'^^^^^k = E^] -VkiE). 



(t) 

Consequently it suffices to only analyze the distribution of the {k + l)-st component 
(/fc+i, Sfc+i, of the transcript random variables 8^^\£^^~^^\ Because both variants A^^^ and 

^C^+i) of the algorithm select the query interval Ik+i based only on the length-fc transcript prefix 
(identical in both cases, since we conditioned on this in (f)) and their internal randomness (identi- 
cally distributed in both cases), both algorithms will query any given interval Ik+i with the same 
probability, so we only have to deal with the remainder of the triple, namely the last two entries 

Fix any given outcome Ik-\-\ ^k+i' ^Ve now define a set G of possible outcomes (A,?/) for 
the block size and the offset in the draw of a "No" -distribution D ~ as follows. The outcomes 
(A,y) that are not in G are those that fall into one of the following three categories (the first two 
corresponding to two of the "bad events" from our earlier discussion): 



(a) A/logiV< < A-(logiV)2; 

(b) \I'k+i\ < A/logA^ and /[,_^-^ intersects two blocks; 

(c) l-^fc+il < A/log and I'f^^i is contained entirely within a single block but not entirely within 
a half block (it intersects both the first half and second half of the block it is in) . 



We claim that Pr[(A, y) G G] > 1 — r]{N) (where ri{N) = 0{ "f^°% ) is as defined in Lemma [33]) . 

This is because the probability of (a) (over a random choice of A) is at most 0{ ^°f^°fj^ ) and the 
probability of both (b) and (c) (over a random choice of y, for any fixed outcome of A satisfying 
l-^fc+il ^ A/logA^) is at most 0(1/ log iV), by applying an argument that is essentially the same as 
the one used to argue Items (i) and (ii) in Section [8.2.21 

Fix any pair (A, y) G G. It remains only to argue that Equation (j82p holds for such a pair; 
from the discussion above, it is enough to consider outcomes /^^^^ for I^+i that fall into one of the 
following categories: 



(d) < A/logA^ and I'f^_^i is contained entirely within a single half block (either the ffi'st 
half of a block or the second half of a block); 

(e) |/^+J>A.(logiV)2. 



Fix an outcome of l-^^^^l that satisfies (d). We claim that the variation distance between (the 
distribution of (s^+i, fc^+i) under ^('^)) and (the distribution of (s^+i, 6^+1) under ^(^'+i)) is zero. 
This is because as described in the previous subsection, in both situations (A^''^ or A^^~^^^) the 
sample s is uniformly selected from Ik+i, and b^+i is either set to the previously determined value 
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(if a profile for this block was previously determined in a earlier step) or to a randomly selected 
element of {ititi} otherwise (see items 1 and 2 p I76]) . Thus in Case (d) both distributions (B^''^ 
and over the {k + l)-th triple (Ifc+i, Sfc+i, ^fc+i) are identical, conditioned on Ik+i = I'l^^i- 

Now fix an outcome of |/^_,_]^| that satisfies (e), so > A • (logA^)^. Intuitively, the prob- 

ability that in either setting {A^^^ or ^('^+^)) the point s^+i belongs to either one of the "already 
touched" blocks (blocks that contain some previous sample si, i < k), or to one of the outermost 
blocks that Ik+i overlaps but does not fully contain, is very small. (In more detail, an analysis 
that is essentially the same as that of (iii) and (iv) in Section [8.2.21 gives that the random choice of 
Sk+i (in either setting) hits a "previously touched" block or one of the two outermost blocks with 
probability at most 0(A;/(log A^)^).) Conditioned on this not occurring, in both settings s^+i is 
uniformly distributed among all other blocks, and the discussion of the previous subsection implies 
that bk^i is high (respectively, low) with probability respectively) depending on the 

half-block Sj+i falls into, so the variation distance between the two distributions over (sfc+i,6fc+i) 
under this conditioning is zero. Applying Fact [32l we get that the two distributions over triples 
{I'k+n Sk+i,bk+i) have variation distance at most 0(A;/(log A^)^). Averaging over all possible out- 
comes of we get that the two distributions over the {k + l)-st component (/fc+i, s^+i, fe^+i) 
have variation distance at most (3{k, N) = 0{k/ {log N)'^). This establishes Lemma[33]and concludes 
the proof of the theorem. I 



9 Conclusion 

We have introduced a new conditional sampling framework for testing probability distributions and 
shown that it allows significantly more query-efficient algorithms than the standard framework for 
a range of problems. This new framework presents many potential directions for future work. 

One specific goal is to strengthen the upper and lower bounds for problems studied in this paper. 
As a concrete question along these lines, we conjecture that COND algorithms for testing equality 
of two unknown distributions Di and D2 over [N] require (logAf)^(i) queries. A broader goal is to 
study more properties of distributions beyond those considered in this paper; natural candidates 
here, which have been well-studied in the standard model, are monotonicity (for which we have 
preliminary results), independence between marginals of a joint distribution, and entropy. Yet 
another goal is to study distributions over other structured domains such as the Boolean hypercube 
{0, l}" - here it would seem natural to consider "subcube" queries, analogous to the ICOND queries 
we considered when the structured domain is the linearly ordered set [N]. A final broad goal is to 
study distribution learning (rather than testing) problems in the conditional sampling framework. 
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